At Netflix, improving developer happiness is a goal of paramount importance since it is highly correlated with developer productivity. Netflix Engineering has experienced significant benefits from their investments in improving the developer experience and productivity by using automation, tools and data rather than by management decree and best practices. And they have some hard metrics to prove it.
For example, the productivity team reduced a 62-minute test cycle time to just under 5 minutes in one application using a Developer Productivity Engineering (DPE) technique called Test Distribution. This means developers spend more time creating great code and less time waiting for builds and tests to complete. This result, as well as some other productivity insights reviewed here in this blog, were discussed in a recent DevProdEng LowDown webcast led by Danny Thomas, a tech lead on the Netflix Developer Productivity team. You can view the webcast here.
DPE is an Enabler of Developer Happiness
Netflix Engineering came to embrace the practice of DPE, but not before realizing just how inured many engineers are to problems in the toolchain that DPE solutions address. They had come to accept friction and inefficiencies in the development process as a given. According to Danny:
“It is staggering how tolerant engineers are of toil and frustration and friction.”
DPE proves that it doesn’t have to be this way. DPE leverages acceleration technologies like Test Distribution to speed up the software build and test process and data analytics to hone build and test performance and make troubleshooting more efficient. The results for an organization can be transformative and have unlocked significant productivity gains and improvements to the developer experience.
Danny works within the Developer Productivity organization, but for some the name may be misleading since it may drudge up visions of draconian IDE extensions, counting lines of code, and leaning on developers to push out more story points. The reality is that Developer Productivity is responsible for improving the developer experience by making the tools they use faster and more reliable.
Test Distribution Reduces Test Cycle Times by Introducing Parallel Test Execution Across Machines
Most recently, Netflix Developer Productivity has been focused on making test execution phases complete more quickly. Danny offered two reasons for this:
“It’s great to have quick feedback from CI, but also bear in mind that long test times or resource intensive tests cause engineers to want to push code to CI for verification which takes those engineers out of their local feedback loop. Our hope is that by improving performance and parallelism and the speed of that feedback, we can have folks running tests locally as much as possible.”
Key to the Developer Productivity/JVM Ecosystem team’s efforts in this regard has been leveraging DPE-enabling technologies like Test Distribution. Test Distribution fans out tests to multiple test agents in scalable pools so that test frameworks can take advantage of parallelism. These resource pools can be deployed cloud-scale across many different substrates, allowing users to trade additional compute resources for faster test feedback cycles. Considering the increased cost of engineering resources against a now commoditized cloud services industry, it’s easy to see how a solution like this can lead to immediate ROI.
When we asked Danny why his focus has been mostly on improving test cycle times, rather than other parts of the developer experience, he explains that the decision was made based on the developer experience metrics his team collected:
“We know from our existing metrics that up to 90% of build time is spent in tests, so any kind of avoidance or speed improvement can make a massive difference. Making test feedback faster and allowing tests to run locally will mean developers will run tests more frequently, speeding up the local development feedback loop, rather than depending on CI environments.”
Gradle Enterprise Provides Comprehensive Test Distribution Capabilities
Gradle Enterprise includes a full solution set for enabling Test Distribution which has helped Netflix Engineering augment their toolchain quickly and scale to their colossal global consumer market. Industry-standard deployment options for the test agents are available which can support the need for scale. Elastically-scaled Kubernetes implementations are fully enabled via native support for horizontal pod autoscaling and can be deployed conveniently using the popular KEDA autoscaler component. Alternatively, a standalone .jar distribution offers a vendorless solution option.
The results for Netflix have been promising to say the least. Danny doesn’t hide his team’s excitement.
“For one project we reduced the build time from 60 minutes to 5 minutes just using Test Distribution across multiple machines. Launching all those tests from IntelliJ, and have it delegate to Gradle Enterprise and watch the parallel results just stream while your CPU is barely affected… it’s just really something. So we think that Test Distribution will really move the needle and improve the test experience for everyone.”
The Developer Productivity team shared a Build Scan™ from this project that shows a 1-hour, 2-minute build time for a project without Test Distribution:
Here is that same project built with Test Distribution, taking just under 5 minutes to execute:
To further quantify the impact, this enhancement effectively allows a developer to run tests more than 10x as often as before, and these results came from implementing just one of several fundamental DPE technologies. Other DPE solutions exist which augment one another and involve caching techniques, analytics, enhanced reporting, and machine learning.
Leading Technology Companies are Investing Heavily in Developer Productivity
Netflix is one of many leading technology companies that has over the last few years formally introduced a centralized developer productivity team and institutionalized DPE as a standard software development practice. Alongside organizations such as Twitter, LinkedIn, Google, and Microsoft, Netflix Engineering has built an executive-sponsored initiative whose sole focus is to employ engineering solutions to improve the developer experience and cascade improvements to all levels of the organization.
Danny recommends that businesses seek out team members who, perhaps in an unofficial capacity, make contributions to the toolchain and infrastructure that help with overall developer experience and productivity. They should be allowed to focus on these actions as a full-time responsibility and given the resources and accountability necessary to formally implement DPE practices.
“Really, Productivity Engineering is recognizing the importance of not spreading those responsibilities across the entire organization and ensuring that you know, somebody is on the hook for ensuring the productivity of the team.”
When we closed our discussion with Danny by asking for any parting advice for other organizations that would like to begin implementing DPE, he offered this:
“DPE is not a Bay Area tech thing, it can work everywhere. You should start now, yesterday even. The thing is, you probably already have productivity engineers, you just might not be calling them that. You certainly have people in your teams that spend a significant amount of time on the build and on CI deployments and on making development more productive. That’s what they care about and that’s their passion.”