4 minute read
One of the most powerful features of Gradle Enterprise is its remote build cache. It allows teams to share the benefits of caching, even for local builds. If you have code that hasn’t changed, there’s no reason to rebuild it.
TL;DR: To successfully manage and maintain a remote build cache at scale, you need to easily observe its performance continuously, especially if you’re using the remote cache for local builds. This blog looks at how the Cash App team used the Gradle Enterprise Performance Dashboard to observe remote build cache performance and track avoidance savings per build type. It then shares how the custom metadata capability in Build Scan™ was used to get more detailed insights into how the remote build cache was performing for different team members located around the world. This gave them the data they needed to optimize the remote cache for the entire team. As you will see by reading on, having the data was key to taking any emotions out of business decisions.
A remote build cache is a simple idea, but operating it at scale is not that simple.
About Cash App and their remote build cache challenge
Cash App is Square’s finance application. There are 60+ developers on the Android version of this application, so their builds are relatively complex. The team, led by software engineer John Rodriguez, was looking for proactive ways to make the Cash App build faster. One central tenet of Developer Productivity Engineering (DPE) is to be proactive and not wait for developers to complain before addressing build and test time issues. Good DPE practitioners continuously work to make builds and tests faster.
The Gradle Enterprise remote build cache delivers a shared build cache to an entire team. In the Cash App team’s infrastructure, their CI server seeds the build cache every time a CI job runs and it pulls from the remote build cache which substantially accelerates their CI builds. When developers start their day, the first build of the day gets faster since the build pulls from the remote build cache.
Note: A build cache is different from, but complementary to, a dependency cache such as JFrog’s Artifactory or Sonatype’s Nexus. In a nutshell, a build cache makes building code from source faster by letting you avoid recompiling code that hasn’t changed. A dependency cache makes Gradle tasks and Maven goals more quickly by allowing you to avoid downloading dependencies again. Check out this blog post on build caches and dependency caches if you want to learn more.
Measuring avoidance savings with Gradle Enterprise and custom metadata
With the remote build cache turned on, Cash App needed to focus their observations on the data from expensive local builds — like Lint builds or assembleDebug builds — versus looking at all local builds. Looking at all local builds would add too much noise to the data set by including faster incremental builds. The average assembleDebug time across the organization was faster, but not significantly. The team focused on avoidance savings, the time saved by using the remote cache instead of (re-)building code locally.
Note: This section will look at the Gradle Enterprise features the Cash App team used to analyze remote build cache node performance. Cash App keeps its Gradle Enterprise private, as you would expect, so we’re using screen captures from the publicly shared Gradle Enterprise instances from the SpringBoot project (ge.spring.io) and Kotlin from JetBrains (ge.jetbrains.com).
The Performance Dashboard shows sample avoidance savings summarized across a number of builds:
On average, builds for this server had an avoidance savings of 1 minute and 48 seconds. As you can see, the dashboard gives you details about the distribution of results in addition to the average.
Gradle Enterprise also shows you the avoidance savings for an individual build:
This particular build saved more than 1 hour and twenty minutes using the remote build cache. Clearly, significant savings were achieved with this build.
The Trends Dashboard features a graph showing how builds are performing, and avoidance savings are trending over time:
You can quickly see if avoidance savings are trending up or down or staying about the same. Any changes in this data are worth investigating.
Looking at more specific data, some builds had negative avoidance savings. In other words, some builds were actually slower using the remote cache. Here’s what negative savings look like:
In this example, using the remote build cache actually took longer than ten minutes. Should we turn off the cache for this machine? For this project? For everyone? With the Gradle Enterprise UI, you can easily observe builds that become slower due to the remote build cache.
At Cash App, most builds were faster, but some weren’t. Why? In looking for answers, John Rodriguez came up with a clever idea. He decided to add custom values, such as the machine’s location, to the Build Scan for every local build.
Capturing Geolocation of Local Builds with Gradle Enterprise
The remote build cache node was physically located on the West Coast. Still, the team has members in Ontario, Canada; Melbourne, Australia; San Francisco; New York; and Seattle. Obviously, latency for some users could be significant. Using custom values and the ability of Gradle Enterprise to analyze build data, the team could see just how significant that latency was. Without geolocation data across all local builds, the team could not have observed the benefits they could gain from the remote build cache. They also wanted to see any patterns that decreased the avoidance savings from the remote build cache.
Here’s some pseudocode that generates the geotagged information associated with each build:
ip = [this machine’s external ip address from https://ipinfo.io/ip]
geo = [this node’s geographical properties from https://geoiplookup.io/$ip]
Isp = $geo.isp
City = $geo.city
. . .
State = $geo.region
Timezone = $geo.timezone
This automates tagging based on each machine’s IP address. The team could use the metadata above to analyze build data based on other factors. For example, $geo.isp returns the name of this machine’s internet provider; it’s possible that the ISP could cause some slow build times.
The custom values look like this in a Build Scan:
Analyzing the data
Gradle Enterprise makes it easy to observe build statistics based on custom values. For example, you can filter the data to see just the builds from New York:
This allowed the Cash App team to find specific developers whose builds frequently had negative avoidance savings and disable the remote build cache for their machines.
Next, the team sorted through the data and realized that team members on the East Coast benefited from the remote cache less than their West Coast peers. However, the remote cache delivered great results in the San Francisco area.
With the tags added to their build data, they used Gradle Enterprise reporting features to discover that East Coast users were saving an average of 38 seconds of wall clock time per build. On the other hand, developers in San Francisco were saving, on average more than 3 minutes and 10 seconds per build. That’s a difference of just over 2½ minutes per build.
In looking at solution options, one way to boost performance is to add a second cache node on the East Coast. If East Coast users experienced the same avoidance savings, that would be beneficial. However, would it be enough to justify the cost of the second node? Given the number of similar builds run each week on the East Coast, the time savings per build, and the labor cost of their developers, the team could calculate a hard ROI for a second node.
“If you’re going to spend good money, you should make sure there’s data, and Gradle Enterprise delivered that data,” said John Rodriguez.
Finally, the team used a Gradle property to automatically turn the remote build cache on or off for each machine. Team members who didn’t benefit from the remote build cache (the Australian team members, for example) had the property set to disable the cache. That made the remote build cache as efficient as possible, benefiting most users but not negatively affecting those with negative cache avoidance savings.
Gradle Enterprise’s remote build caching is an extremely powerful feature that can make your development teams much more efficient. Gradle Enterprise’s observability enables data-driven decisions based on precise results gathered from thousands of local builds across the entire team. And the ability to add custom values to each build enabled Cash App to measure the avoidance savings of the remote build cache for specific geographical locations. This is just one example of how Gradle Enterprise’s observability allows you to make the most of your build infrastructure investment.
- Cash App + Gradle: A Journey in Android Developer Productivity at Cash App by John Rodriguez
- Presentation video – the content covered in this blog starts at the 16:46 mark
- Presentation slides
- Gradle Enterprise documentation: Adding custom values to a Build Scan
- Plugin: How to retrieve data from the Internet and create custom Build Scan values from it (by Iñaki Villar). See https://github.com/cdsap/IpInfo for the source code and complete instructions.