Developer Productivity Engineering Blog

How Gradle Reduced Build Scan Storage Costs on AWS by 75%

Gradle recently had an opportunity to optimize the cloud storage layer used for Build Scan®, a feature of Develocity. In this article, we’ll deep dive into the  challenge we faced with inefficient cloud storage, our decision to migrate to Amazon S3, and the remarkable result: a 75% reduction in data storage costs. 

Many of our users find that a little internal housekeeping can help reduce operational costs. Here we tell our own story in the hope of motivating similar efforts (and significant savings) for your team and organization.

Challenge: Amazon RDS was getting expensive

Several years ago, we began using Amazon RDS to support our Build Scan feature for Apache Maven and Gradle Build Tool in Develocity. A Build Scan is like an X-ray of your build that provides granular analytic information so you can troubleshoot failures quickly, address performance bottlenecks, and collaborate with colleagues more efficiently. The image below compares two builds using an array of Build Scan metrics.

This image compares two builds using an array of Build Scan metrics.

Amazon RDS is a very high-end persistence layer often used for cases like data deduplication across shared databases, data-intensive querying, and complex event processing (CEP). Our use case did not require the extensive features provided by RDS, so it was essentially like using a Formula 1 race car to check the mailbox down the street. 

When Gradle began to offer free scans to any Maven or Gradle user—as well as full-fledged instances of Develocity to popular OSS projects like Spring Boot, Kotlin, JUnit, Hibernate, Testcontainers, and the entire Apache Software Foundation—RDS storage requirements began to shoot through the roof. 

In this image, you can see RDS expenses more than doubling in the course of a single year.

In the image above (financial figures have been hidden), you can see RDS expenses more than doubling in the course of a single year. We knew something had to be done. However, due to several technical limitations, it wasn’t feasible to migrate to a different persistence layer at the time. 

Interestingly, the answer to our own rapidly rising cloud storage costs was represented in a new feature in Develocity 2022.3. 

Solution: Migrate to Amazon S3

One of the many new features released in Develocity 2022.3 (see the release notes) was faster and cheaper Build Scan data storage via S3-compatible object stores. 

With this release, we provided GE customers with an additional S3-compatible store for larger installations that produce high volumes of data. The added benefit was that this allowed us to migrate and manage scan data using S3, which is not only considerably less expensive than RDS but also left us with a much smaller, easier-to-maintain database.  

In the image below, you can see the effect of this migration. The first drop occurred on Jan-28, when we removed the RDS database instance. On Feb-09, a visible spike emerged when we exported backup data to S3 Glacier Deep Archive. On Feb-13, we removed the old backups. The final state of affairs on Feb-15 shows about one-third the cost as before.

This also unblocked a previous technical limitation that prevented us from intelligently removing unaccessed and outdated Build Scan data. There have been over 10 million builds scanned and analyzed for free since we launched Build Scan, and we collected and saved all of them—even those that were never activated and accessed by the submitter. 

Since we no longer had to store data in a shared, deduplicated form, migrating to S3 allowed us to enable automatic deletion after a period of time for unactivated scans (users can simply scan their build again at any time to recover that information within seconds). This contributed to a further decrease in expenses.

Results: Immediate 75% reduction in cloud expenses 

Using conservative estimates, Gradle was able to reduce cloud expenses by 75% by migrating from Amazon RDS to S3. Our engineers expect further cost reductions in the future—for example, we currently have object versioning turned on so as to remain cautious as we further refine intelligent expiry/deletion for unactivated scans and accidental deletions.

Develocity Platform Services include Performance & Scalability, Security, Integration, and High Availability. AWS S3 support is one of many Performance & Scalability capabilities. We invite you to learn more about how all these elements of Develocity ensure that your deployment scales and remains highly performant.