Developer Productivity Engineering Blog

Developer Productivity Engineering

In this webinar, Hans Dockter, CEO & Founder of Gradle, talked about the emerging practice of developer productivity engineering, a discipline of using data to improve essential development processes from build/test to CI/CD.

Topics covered in this webinar were:

  • Quantify the costs of a low productivity environment with wasted time waiting for builds, tests, and CI/CD pipelines
  • Communicate the importance of fast feedback cycles and catching errors earlier, including incorrect signals like flaky tests
  • Discuss acceleration technologies for speeding up feedback cycles
  • Make the practice of developer productivity engineering a respected discipline
  • Use data to understand and improve essential development processes

You can access the slides of the webinar here.

Thanks to everyone that attended the webinar and thanks for your questions!

Hans: Hello, everyone. My name is Hans Dockter. I’m the founder of Gradle, and I welcome you to this webinar today from our Gradle office in San Francisco. The topic is developer productivity engineering, and I hope you will enjoy it.

We will talk about the case for developer productivity engineering, the importance of fast feedback cycles, and how to make developers more productive with fewer incidents and better support. Those are the three main topics for today.

So let’s start with the case for developer productivity engineering. What is very important to understand is that we all know software development is a creative process. But it’s less the creativity of an artist, it’s more the creativity of a scientist. A scientist has a hypothesis and then enters a dialogue with nature via experiments to see whether his hypothesis is correct or not, and it influences his thinking about the world and about how to model it.  And for software developers and software engineers, it’s similar, but the dialogue is with the ToolChain. So our hypothesis is a code change, and then we ask the ToolChain, which means the compiler, checkstyle, unit tests, performance tests, integration tests, or the running software itself. We need the feedback to see whether this code change is doing the right thing.

And the quality of the creative flow of a developer depends a lot on the quality of the dialogue with the ToolChain. That includes how fast you get a response and how correct your response is. And ideally, you get answers instantaneously. And the answer is always reliable. So that was kind of for many of us, the world when we started doing coding. We had our development environment, and when we changed the color of the circuit from red to blue, we saw that change immediately. And this instantaneous response from the ToolChain was pivotal to have this creative flow and to really get hooked up on software development. And one question would be interesting to think about if it would have taken five minutes for the circuit to change the color, how many would have continued with coding. Maybe quite a few would have lost the joy and would have never taken this up as a profession. But things have changed, now, we’re developing software in bigger teams. We are doing enterprise software development. And that is creating a unique set of challenges we haven’t had before.

So one is that there is a new dialogue that is entering our work. That dialogue is about collaboration. We need to have a collaboration with the business experts and customers, and we interpret their ideas via code. And we need fast feedback cycles, and we need to have an effective dialogue with the business expert or the customer to see whether our interpretation of the idea is what they had in mind. And again, the effectiveness of that collaboration depends on how quickly you can iterate. And again, the ToolChain is pivotal here how quickly those iterations can be.

So, when you think about team productivity, the productivity of your software development team, it’s determined by the quality of the creative flow of your developers and the collaborative effectiveness. And the reality is that both of those are heavily influenced by the ToolChain. And when you do enterprise software development in medium or larger teams, the developer ToolChain is very complex machinery with complex inputs. It affects the speed of iterations, the feedback cycles, and the reliability of the feedback.

So, when you do enterprise software development, ToolChain efficiency is a key enabler of creative flow and collaborative effectiveness. If the ToolChain efficiency is very low, there’s not much you can do. You will not have a good creative flow and you will not have good collaborative effectiveness, regardless of anything else you might be doing to improve that.

I have worked closely over the last few years with hundreds of build teams and developer productivity teams, and what I can say is that the ToolChain efficiency ranges from mediocre to bad to terrible. That’s what we see. We don’t see good or very good. And so what can we do about this to change that? And one thing that is very important to keep in mind, as your project is becoming successful, it severely puts pressure on the ToolChain. Lines of code, number of developers, number of repositories, number of dependencies, the diversity of your tech stack is always growing, often exponentially, and if you don’t take active measures to make your ToolChain efficiency scale with that, it will collapse. And the return you will get from all those investments, investing more into your development, will become marginal because of the reverse effects it has on the ToolChain. So that is something, even if you think you are in OK shape right now, you’re kind of mediocre right now, as you’re growing, it will turn bad and it will turn terrible if you don’t do something about it.

So, to address that, we need the discipline of developer productivity engineering. It’s not a completely new thing, in respect, some of that work is already done in many organizations, but very often, it’s not a clear focus. So what we need to practice this discipline is a team of experts whose sole focus is on optimizing the effectiveness of the developer ToolChain. And the objectives are a high degree of automation. If things are not automated, you cannot accelerate them. You cannot make them more reliable. That’s the worst state to be in. So you need a high degree of automation, you want fast feedback cycles, and you want correctness of the feedback. And if there is no clear responsibility for that, if there is not a dedicated effort, you will have the opposite. 100%, right? This doesn’t happen automatically or doesn’t happen as a side job.

If you want to achieve those three objectives, you have to have people that have a dedicated focus on that. And you need a culture where the whole organization commits to an effort to improve developer productivity, where you can have a reasonable discussion about, or maybe disabling the virus scanner for the build output directory. And very, very important, the priorities and success criteria for developer productivity engineering is primarily based on data that comes from a fully instrumented ToolChain. It’s not based on who complains the loudest, or on some gut feeling. ToolChains are way too complex, and if you don’t have data, you will not be able to do the right thing. 

Having said that, I’m curious, we want to do a poll, and I want to ask the first question. “What are the most challenging/difficult parts in your build process?”

And that’s typically what we are seeing, slow build/test time is often the number one problem. But the other very important problem is build reliability, debugging time, finding/resolving dependencies belongs in that area. That is basically the other 50%. So that is in short what we usually see. Great, thank you.

Because of those issues, slow build, test time, debuggability issues, all those ToolChain issues we will dive into more detail is the reason why development teams work far from their true potential. So there is a significant gap between the actual productivity of software development teams and what they would be able to do if the ToolChain were more efficient. And the gap, as already mentioned, is growing over the lifetime of a project for most organizations. And we have seen it many times, the gap can be significantly reduced with the practice of developer productivity engineering having a clear focus.

And I just want to share a story. It sounds so obvious that you have to do that, but what we see many times nowadays, that when we visit an organization, we meet with the CI team, we meet with the application teams, and the CI teams, they don’t own developer productivity. They own things like auditability, security compliance. So, what they often do, they add tons of checks to achieve their goals, which dramatically increases CI build time. But they’re not incentivized by making CI builds faster, they are incentivized by maximum security compliance. So they sometimes go over the top when it comes to that, and there is no balance in the organization to say, hey, developer productivity is a thing. That is something that needs to be taken into account when we make certain decisions about doing additional checks, et cetera, et cetera. And we see this all over the place. And there is frustration and there is friction between those teams, but because there is no clear responsibility, it doesn’t get resolved.

And a very, very, very important point is, most developers want to work in an environment that enables them to work at their full potential. There might be a few who kind of enjoy an unproductive environment but that’s the exception. And very important for organizations is that if they cannot provide such an environment, they will lose talent.

There is this story where a Netflix executive is talking with an executive of a Wall Street bank, and the guy from the Wall Street bank said, hey, if we only had your quality of developers, then we could compete completely differently when it comes to software. And the Netflix guys said, well, guess where we got our developers from? We recruited them. We went to the East coast. We did Java user group events to show them, hey, this is where you can live up to your potential. And that is a key decision-maker for many developers where they want to work. That’s very important. Losing talent is the worst that can happen to your developer’s productivity.

And from a business point of view, developer productivity improvements provide such high leverage for every dollar invested in software development. Most money goes into the salaries of the software developers, and they are hard to find. So the leverage you can get is tremendous. And if you don’t do that, you have a significant competitive disadvantage. All business innovation nowadays needs to be funneled through software. And innovation is blocked, business innovation is blocked if you’re not productive enough with your software development. We all know that, so it’s almost crazy if you don’t do any dedicated effort to improve your productivity. Every other industry is investing in productivity, and software development teams have to do the same thing.

And the challenges for an organizational change. So that’s interesting. When we do surveys, so, for example, we sometimes do this together with our partners in the ecosystem. Let’s say they need to make a case. Hey, we need to modularize our codebase more and I need to reduce the number of flaky tests.  What developers tell us, they very often struggle with how to make the case to the manager of why this is a necessary investment. And if they cannot make a clear case, it’s postponed, it’s not addressed. And a big part of making the case is quantifying the benefits. If you’re not able to do that to some degree, it’s hard for a decision-maker to say, OK, please make that investment decision. So, what we often see, let’s say an organization is not in good shape and developers are very unhappy with how the ToolChain works. Let’s say you now do a dedicated effort and you improve, let’s say, build and test time by 20% and reduce flaky tests by 25%. If you just make a survey with the developers, they would just say, oh, I’m not happy. It should be better. But if you can quantify it, you can say, yes, I know we’re not in paradise yet, but this is actually improvement we made. It completely changes the conversations if you can quantify, then you can show progress based on numbers and not just on gut feel and things like that.

Very fast feedback cycles are important. I think everyone attending the webinar, 50% of the people already said that’s our biggest problem. Also, here, it’s very important to make the case. To look at it in depth why it is so important and to have a very strong argument. So one thing I found fascinating when I saw that for the first time, faster builds improve the creative flow.

We work with organizations and they have many teams, but let’s just look at team one and team two at this organization. And one team had 11 developers and the other team had six developers. And the build time of team one was four minutes and of team two, one minute. Build and test time. So four-minute build and test time, people, they do not go to the CEO and say, hey, I cannot work here anymore. But the very interesting thing you can see is that the team two has twice as many local builds and test runs than team one. So they much more often ask for feedback. They have a much better dialogue with the ToolChain, and even though four minutes is not a terrible build time, probably, if you would get the build time down to three minutes, we would see a higher number of local builds and a healthier dialogue with the ToolChain and a more creative process for those developers. And once you see the number, you are like, wow, that’s very interesting. So, even if your builds are not considered slow, getting them down from four minutes to three minutes makes a big difference. Getting them down from one minute to 45 seconds makes a difference. So that is the creative flow.

But of course, there is another more obvious effect of build time, it’s waiting time. So builds that are faster than 10 minutes cause significant waiting time because those are usually builds and tests where developers are idle and are waiting for the build to finish. It’s not worth to switch to a different task, very often. And the aggregated cost of waiting is surprisingly high, even for very fast builds. So we mentioned that team with six developers, and 1,000 builds per week. We managed, just with a little bit of investment, to get their build time down from one minute to 0.6 minutes. And for this team, this reduced waiting time meant 44 more engineering days per year for the whole team. That’s a very significant number. One minute to 0.6 minutes doesn’t sound very sexy, but if you do the math, if you have 1,000 builds per week, it’s actually very significant. And if you have a longer build time, let’s say, you get it down, 100 people team, Nine minutes to five minutes. It can increase productivity by, per developer, multiple hours per day. And we’ve seen that. And one thing to keep in mind is that an unreliable ToolChain substantially increases waiting time. So if you have a lot of flaky tests or weird issues with the build, the average time for the feedback cycle increases significantly.

OK, then just make the build longer than 10 minutes and no one is waiting anymore and we are all fine. Obviously, that is probably not a good idea. As build time increases, people switch more and more to do different tasks while the build is running. And now, the cost of context switching has to be paid. Not always. Let’s say I finish a feature and then I fire off the build and everything is green, everything is fine. I don’t have to pay the cost of context switching. But whenever this build fails and I have to go back, then I have to pay that cost. And what we see, usually, is that about 20% of all builds fail. So this is a very frequent thing that the build fails and you have to figure out what is going on. Or you have to go back to the previous task when you had to trigger a build as an intermediate step. You want to see, is that working so far? And if everything is green, you continue with the work. And context switching often costs 10 to 20 minutes. It depends, of course, but it’s a significant number of time. You have to pay for that. And it has to be paid twice when you start with a new task and then when you have to go back to the previous task. And again, an unreliable ToolChain substantially increases this cost. Let’s say you have a flaky test so your build fails. You have to go back. You figure out, oh, it’s flaky. Everything was OK. Then you have to go back to the new task. You get the idea.

So one important aspect of longer build times is it makes them harder to debug. Why? So, when a build takes longer, the changeset of a single push does increase. When you have a build time of 20 minutes, the granularity of what you’re trying to push through the pipeline increases. Because it’s just too expensive to do it too fine-grained. But the bigger the changeset is, the longer it will take on average to debug a failure. Obviously, there are more changes that might be responsible for a breakage, and this affects local build as well as the resulting CI builds.

And when CI build takes longer, then the number of contributors with changes per CI build increases. For example, for the master build,  more stuff gets merged in before the master build starts again. We have seen extreme examples with large C++ codebases where the build time was 20 hours. So that meant they could run one build today. So now, imagine they had 400 developers in the team. It was a big monorepo. Now, imagine this build fails. They had, I don’t know, 200 commits in that build. Now, have fun finding out which of those changes broke your build. And the other thing is, if the master build takes longer, this also increases the likelihood of merge conflict. If you have a master build that takes an hour, then you only have, let’s say if you’re all in one time zone, you only have eight opportunities a day to get your changes merged, and everyone is trying to do that.

The other thing that is important is that, when you have a failure, the time fixing the failure is growing exponentially with the time it takes to detect it. Of course, it’s not a smooth growth curve, but in general, that is the case. And it depends on things we have already mentioned like the changesets get bigger. Context switching plays a role. And that is a huge incentive for having faster feedback cycles. But what we’re seeing right now in many, many organizations, and I think it’s a huge problem.

So I can share one story. There was an organization. They had build times of 25 to 30 minutes. And the developers they have a Maven repo with, I don’t know, 400, 500 sub-modules. And so a developer running just a normal build, compile, and run the tests and do the other checks and source regeneration, whatever, took 25 to 30 minutes. So they complained to the VP of the developers, saying, we cannot work like that anymore. And then they had a task force, and that task force tried hard and they got the builds faster by five minutes or so. A good, worthwhile effort, but they still wanted to get it down to five minutes and not to 20 minutes or something like that. So what they decided to do, they did some magic and they said, well, stop running tests locally. And now, the build time is under five minutes. 

But you pay a high price if you do that. If, because of growing build times, you push running the tests to a later point in the life cycle, the exponentiality is hitting you even harder, because now you have a long-running build plus you execute it at a later time, so the time to detect the defect is increasing and exponentially is the time increasing to fix that effect. And it also increases the changeset side again as it becomes even more inconvenient to get feedback.

So we see this all happening all over the place. So it’s kind of ironic. Everything is investing in CI and CD products, but the actual process of continuous integration is going in the opposite direction, and we have to change that. It’s often a vicious circle, long builds increase changeset size, changeset size increase debugging time, debugging time increases the average time until a build is successful and that increases the likelihood of merge conflicts. That’s something you might be confronted with.

And so some people say, well, for us, the situation is not quite like that, so does all this apply to every project? I would say a project with relatively fast builds, they pay a high waiting time cost. Whether developers are complaining about it or not, if you do the math, it would be significant. And projects with long builds, they pay both high waiting time and context switching costs. And obviously, let’s say, if you’re a small company and you have a low number of committers, you would be less affected by merge conflicts than when you have many people committing.

But some people say, hey, we’re doing microservices. We don’t have those problems. So let’s say building and testing a single repository is relatively fast because you have many repositories. There is still a lot of incentive to make it faster. We talked about it. But you’re now in a challenging situation. The producer build no longer detects that the consumer is broken. That’s the nice thing when you have a single build,  the producer runs the build, and then oh, module 10 is broken by my change. That is no longer happening. So the consumer has to figure out why they’re broken, and triaging is often very time consuming and complex. Once the consumer figures out, oh, my build is not broken. It’s about, oh, dependency one, two, three have changed. I did some changes. What exactly is responsible? So the integration problems are often discovered at a late stage and that is the price you pay with that approach. So our recommendation is, don’t go crazy with the granularity of your repositories. Otherwise, you run into serious issues because of that and other challenges.

So long feedback cycle time is poisonous, and developers might look for a different job if things get too much out of hand. OK, so that is the suffering that many of us have to live with or whose job it is to improve it. So how to improve fast feedback cycles. And there is one technology, one concept that is essential to make builds faster, and that is build caching. And before I go into that, I want to have another poll and ask, “What build tools are you using?”

So let’s share the results. So 50% Gradle, 14% Maven, 18% Gradle and Maven, 17% other. The concept is the same. The solutions we are showing, they will work for Gradle and Maven.

So the concept of a build cache was introduced to the Java world by Gradle in 2017. It’s available for Maven and Gradle. We will eventually work to also make it available to other build systems in the JavaScript ecosystem, et cetera. Probably not for Ant. Maybe we should ask in the next whole who is using Ant. That would be interesting. So I know 50% said they are using Gradle, and some of you might be already very familiar with the concept of a build cache, so sorry if I’m repeating something you already know, but we see a lot of confusion about what a build cache is.

People often say, well, build cache, we use at a factory or something like that. And it’s completely complementary to a dependency cache. A dependency cache caches binaries that represent different source repositories. So basically, at a factory, the cache is Maven-central, and then the local dependency cache of your Gradle build tool or your Maven cache those binaries and they represent different source repositories. And it can be used to accelerate things, but that’s not the main point. The main point is to go against a Stable release from a particular source repository. A build cache accelerates building a single source repository. That’s the purpose of a build cache. And it caches build actions. For example, Gradle tasks or Maven goals.

So let’s look at it in a little bit more detail. So all those build actions, a Gradle task or Maven goal, they have inputs and outputs. And the concept of a build cache is, when the inputs have not changed, the outputs can be reused from a previous run. So Gradle has a form of that with the up-to-date checking, but if you think about it, the up-to-date checking only reuses output from the previous build, when you don’t do a clean build. So, in that respect, Gradle has some form of build caching already in it since whatever, 2010, but this build-cache doesn’t go over time. So when you switch between branches when you build branch A and then you switch to branch B, you build branch B, all the build results from branch A are gone if you don’t use a build cache. And then the up-to-date checking is not helping you. And when you do clean CI builds and ephemeral CI builds, it’s also not helping you. But the key idea is, when the inputs have not changed, the output can be reused from a previous run. And the previous run doesn’t need to be the last run. It can be the run from two days ago, and of course, it can be reused across machines.

So, to give you an example, if you look at a compile task, the inputs are the source files, the compile classpath, the Java version, the compiler configuration. And what we’re doing when we do caching is that we hash all that input, so it’s not timestamp-based, it’s content-based. We hash it. And out of those hashes, we create a key that uniquely represents that input. And then we ask the cache, do you have output for that input? If not, then we run the build action, take the output, put it in the cache. If the build cache has output for that input, we get the output from the cache, unzip it into the build output directory, the Gradle build directory or the Maven target directory, and don’t run the build action, in this case, the compile task. And what is very important to understand, it’s a generic feature. It doesn’t just work for compile but it’s particularly effective for avoiding test execution. Testing the inputs are source files, runtime classpath, et cetera. The output is the test results. And if the inputs haven’t changed, the cache gives us the test results without actually running the tests.

But to be clear, caching is effective for multi-module builds. Builds with a single module will only moderately benefit from a cache. But as soon as you have a few modules, you will get significant benefits from the cache. And as your module is growing, that will even increase. But if you have five modules, fine, it’s absolutely worthwhile. It would be spectacular, probably, the results you will get from applying the build cache. And I will talk more about examples where we show the savings in a couple of minutes.

But I want to explain a little bit more why the cache is effective. So you have a multi-module build, let’s say five modules. All those modules need to be built. Let’s say you have a generate sources action, compile, checkstyle, compile the tests, run the tests. So just look at those five actions. And now, let’s say you do a change in source main Java in module five. And let’s say no other module depends on module five. Now, when you run the build, whether you run the clean build or whether you run the CI build on a new agent, it doesn’t matter. Module one to four, all the outputs of all those tasks are retrieved from the cache. They don’t need to be rerun. Only the module five actions, because you changed the source code, well, the generate sources task probably does need to be run, but anyhow, you do recompile, run checkstyle, compile test, and run a test. But it’s only one-fifth of the build actions that otherwise need to be run. So, if everything is evenly distributed, your build will be five times faster.

Another example would be you just change source test Java on module five. No other module depends on module five. Then, you only need to rerun compile tests and tests. Even in module five, most of the stuff is now up to date. So the other scenarios, let’s say you do a change in source main Java of module five. That does not change the API of module five. In that case, you need to rebuild module five, but for module four, you don’t need to run checkstyle again. You don’t need to generate the source and compile. What you need to do, because the runtime classpath has changed, you need to recompile the tests and you need to rerun the tests. But that’s a fraction of what otherwise is necessary.

If you change module five and the API of module five has changed, then you also need to recompile module four, but all the other stuff is up to date and can be retrieved from the cache. So, of course, let’s say you have a module one and every other module depends on module one and you change the API of a module. Then, the majority of build actions need to be executed. Still, not all of them, but the majority. But those changes are usually the exception, and on average, you have a huge number of tasks pre-built that does not need to be executed.

So, as already said, even with only a few modules, the cache significantly reduces build and test times. What we see with the larger multi-module builds is that, often, 50% of the modules are leave modules. So no other modules depend on that module. And for those modules, the build time is reduced by approximately 1 divided by n, with n being the number of modules. So if you have 100 modules, like, let’s say, the Spring Boot project, then you have approximately 1/100th of the build time when you change such a leave module. Its effects are tremendous of a build cache. We work with LinkedIn, Airbnb, Tableau, big banks, so we see that in action every day. So we’re not speculating here. We have enough evidence. We can really say, if you have a multi-module build, a build cache will have spectacular results.

Checking the inputs and downloading and unpacking the cache items introduces some overhead, but it’s often very small compared to the benefit. But it’s something you need to monitor. To show you some examples, when we started introducing the cache for our CI builds, the average CI build time has improved by a factor of 80%. So they’re 80% faster on average for the Gradle project itself. When you plug in the build cache, you should see some results out of the box, but then, of course, there’s a lot of things you can do to optimize the cache effectiveness. We’ll talk about that later.

Many of you know the Spring Boot project. It’s a build with Maven. And running compile and unit tests on my machine takes about 20 minutes. And when we just plugged in Gradle Enterprise with the build cache from Maven, just plugging it in, immediately, the fully cached build was six times faster. And with some optimizations and later versions of Gradle Enterprise, we get it now under two minutes. 10 times faster when you have a fully cached build. Usually, you don’t run fully cached builds because you actually have a change that you want to run, but with 100 modules that Spring Boot has, many builds would be much closer to the two minutes than to the 20 minutes.

What’s important to understand, usually, you have two instances of the cache. You have a local cache and a remote cache. And the local cache is used by developers for your non-committed changes. It’s very effective when you switch between branches, for Maven, anyhow, because you always have to run a clean build. But usually, the changes that are published to the local cache are not shared with other developers. And then you have a remote cache that is usually only written to by CI, and this build output is shared with all the developers and of course with the other CI agents. So the remote cache makes CI builds much faster as well as developer builds. And a classical example would be, if you do source generation and developers come to work in the morning and they pull from CI and then they all have to generate the same sources, well, with the build cache, CI has already done that.

The local build cache is a cache directory on your local machine, and it speeds up single developer builds or build agent builds when it preserves the state. And we have to remote build-cache, which is a service that you need to install in your organization. If you use Gradle Enterprise, it provides such a service. They can be configured so that they replicate from each other. You can have multiple nodes, remote nodes so that they’re close to your different locations. So that is a thing you can make very scalable. And a very important part of the built cache.

So we just got these numbers from a customer of ours a couple of months ago. Once they started to introduce the build cache, their agent availability was much better. They had about 40% of CI builds were queued, on average, and then as they started to introduce to cache and started to optimize the cache, the first time ever, they had a situation where not a single CI build needs to be queued. So in terms of CI availability and also costs of CI, the build cache is usually very impactful.

If you want to learn more, for Maven, just Google for Maven build cache and it will take you to this page. For Gradle, there is a very good tutorial on how to get started with caching. If you’re not using cache yet, not using the remote cache yet, absolutely get going with that. Even very small projects,  like, let’s say, SLF4J. To build and run the test for SLF4J takes 30 seconds, 25 to 30 seconds. If you do it with a cache, you will get fully cached down to 10 seconds. And if you run this many times, it’s worth it. So absolutely, build cache. There’s nothing that can give you the same performance improvements than a build cache, just because the fastest thing is not to do things at all.

The other thing I want to show you is something we’re working on right now. It’s not released yet, it’s in distributor testing. So, what you can see here, here, we have a Gradle build for Maven. It will work in a similar way. You can tell Gradle, hey, Gradle, run the tests that I still need to run. Run it in a distributed way. And the way we are doing this will be in a hybrid approach. So you can tell us, hey, you can use so and so many processors locally and then use so and so many agents remotely, or ask a service how many remote agents are available. So, in this case, we run the tests with two local agents, and we have an integration with Jenkins where we ask Jenkins, hey, how many build agents do you have for me to run my tests? And Jenkins said, hey, I have two for you. So now, we have two local processes and two remote agents. Well, you don’t see that much except the timer running, 10 seconds, 11 seconds. So, what you can see here, the build took 17 seconds. Otherwise, it would have taken 34 seconds. And if you want to have even faster builds, just throw more resources at a problem. So we’re super excited about that.

It’s directly integrated with the build system, so it works for Gradle and Maven just by running the build. It uses the resources on your local machine as well as remote agents. And particularly, it’s so convenient to use for local changes. You don’t need to somehow go via CI. Just run your build and it will run the tests remote as well as local. A lot of the secret sauce will be to make it really low overhead. The distribution will be very fine-grained. So it would be per test call. So even if all your tests live in one module, we will distribute them test class by test class. Oh, there’s a typo, it’s not test cause but test classes. Smart scheduling for maximum utilization, so you don’t need to do any sharding yourself. A high number of resources can be used, and it’s even configurable based on the type of build it wanted.

So there is a question, “Will it be possible to execute whole tasks on Jenkins?”

So, at the moment, our first step into the whole distributor terrain is, just focus on a test and make it more fine-grained then per task. So this will work just for tests, and the distribution unit will not be a task but a test class or method. What we have already spiked and what we will address in some form, probably next year, is, besides the tests, how you can run any type of task or Maven goal in a distributed way. So that is not what this feature is about. That would be the next step that we will be working on. We focus on tests at the moment for two reasons. They are a very special problem that’s often where most of the time is going, and when you distribute things via tasks, you rely to a degree on how well is your code modularized. Otherwise, you have a super big task, a super big module that is executing 10,000 tests, and then your unit of distribution is 10,000 tests. With this approach, it doesn’t matter because we distribute not per task but per test class. But nonetheless, this is also something we are working on and you can expect next year, probably. It’s not scheduled yet, but if you ask me now, that will be my prediction.

We talked a lot about what you can do to make things faster in terms of acceleration technologies, but it’s also extremely important that you can actively improve your performance. So what I mean by that is, performance regressions are easily introduced. Infrastructure changes, binary management, CI agent configuration, how the caching is set up. People introduce new annotation processors or change versions of annotation processes and have no idea that the compile-time increases by a factor of five. Build logic configuration settings, compiler settings, memory settings.

So what we see at the moment is, what happens today with most regressions is they’re either unnoticed, they’re noticed but unreported because people don’t have had a lot of luck in reporting regressions, they’re reported but not addressed because the root cause is hard to detect, especially with flaky issues, and the overall impact and priority cannot be determined. If I say, oh, my build is slow, let’s say to the build team, how do you then prioritize that? If you had data to see, well, 50 developers are affected by that performance problem, it would be pretty easy. If you don’t have that data, it’s hard for you to make a call. And then, often, they get escalated after they have cost a lot of pain, and that’s the only way they really get addressed. Then the problem gets fixed after it has already wasted a lot of time and caused a lot of frustration. So the result is the average build time is much higher than necessary and continuously increasing.

The only way to have high-performing builds is you need analytics and you need data. Even with distributed test execution and build caching, there is so much that can go wrong and that can affect performance. You need data to optimize your ToolChain. Like you need data to optimize the performance of your application. And the key thing is, whenever a build is executed, locally or remote, you need to capture data, and then when the data is good and comprehensive, you can often easily detect the root cause for the problem instead of reproducing the problem. And the problem can be detected early. And having all the data from every build allows you to prioritize decisions. It’s the only way to have a performing ToolChain is to have insights into the ToolChain. I mean, it’s pretty obvious. But hardly any organization has it.

So I just want to show you very quickly what I mean with that, how this could look like. This is Gradle Enterprise, which offers such a data collection, or however you achieve that, you need the data and you need the analytics to have an effective ToolChain. So here, what you can see here, we have executed 110,000 builds in the last seven days. And let’s say if Hans is always complaining about builds, you can say, oh, my builds are always failing. You can look for it. You can say, OK, what’s the situation with Hans? OK, he had a few failing builds, and then you could look at what is the reason why they failed. So let’s do this again. You can go to any build. And that’s what we call a build scan. And then you have very comprehensive data. What was the infrastructure setup? How was the build configured? What dependencies were used? A deep dive into every aspect of the performance of the build. You know, memory situation, et cetera, et cetera. And with that, when someone says, my build is slow, they send you a build scan, you can now see, oh, is it because of dependency download time? Is it because of whatever, right? This is key.

The next point many people described as the biggest pain point is reliability issues. And that is a huge pain or frustration, including flaky tests. So how can you make your developers more productive by either let them into fewer incidents with the ToolChain and if there is an incident, provide better support? 

So there are two types of failures. Well, three. One is a verification failure. So two different types of ToolChain failures. So a verification failure is a syntax error detected by compilation. Code style violation detected by checkstyle. Misbehavior in the code detected by a JUnit test. And then you have non-verification failures, flaky tests, binary repository down, out of memory exception when running the build. And then you have slow builds. They’re not strictly a failure, but obviously a problem you need to help people with. And sometimes, maybe only a few builds are slow and then you need to figure out why exactly those.

And triaging and prioritization is often very difficult for various reasons. So one is non-verification failures masked as a verification failure, for example, a flaky test. Developer looks at, oh, my test failed. Other less common but still they happen, verification failure masked as a non-verification failure. Let’s say a snapshot dependency issue. You think something must be wrong with the build. It has worked five minutes ago. Now, the build is not working. And you think it’s a non-verification failure, but actually, a new snapshot dependency version was picked up. And then, when you have non-verification failures, it might be unclear, is it caused by a bug in a Gradle or Maven plugin or is it a user misconfiguration? Hard to figure out, very often. Many issues are flaky and hard to reproduce. And the general problem is, you do not have enough information available to help efficiently. No data for local builds is collected and only limited data for CI builds, the console log and the test reports. 

So that’s why most troubleshooting sessions begin with a game of 20 questions, and people are afraid of that. That’s why they often don’t report issues. And the person asking for help often doesn’t know what context is important. Helpers can burn out helping. It’s often not what people want to do the whole day. But when it’s inefficient, it consumes a big amount of the time. And one big problem is, root cause analysis is often impossible without the helper trying to reproduce the problem. And the impact analysis is not data-driven. So when someone is complaining about a reliability issue, how important should it be to fix it?

So again, same with performance, you need to capture data from every build run, local and CI. It’s the only way to effectively diagnose flaky issues. But the data has to be comprehensive to allow for root cause analysis without reproducing. And having all the data allows for impact analysis.

One way to get this data is by Gradle Enterprise. You can install it on-premises. You can easily connect it with all your Gradle and Maven builds, and it will send a very comprehensive set of data that you can even extend with your own custom data so that you can effectively figure out the root cause for issues you have when it comes to ToolChain reliability. And it’s a game-changer once you have that. You could think about creating your own solution. Writing plugins that extract all the data from the build, pump it to some data store. However you do it, you need the data of each and every build, local and CI. Otherwise, you cannot effectively deal with reliability issues.

I already talked about build scans or something similar is often your only friend when you try to help someone and have to fix an issue. We were actually trying out the build cache with a large Maven build at a prospect. They have a, I don’t know, 500 sub-module Maven build and huge build-time issues. And then we were installing the build cache. And when I did the first experiment, they were very disappointed. They said, hey, the build with the build cache takes five times longer than the build without it. And we were like, that cannot be. And remember, we were talking with the build experts there, their build team that really understands their build infrastructure. So we said, something must be different between the two build runs. And they said, no, it’s exactly the same. And they almost got annoyed by us questioning, basically, whether the builds are really running the same way. But then we said, hey, let’s look at the build scans. And then we looked at the infrastructure section and we could see, well, the build without the build cache is using 128 threads. The build with the build cache is only using one thread. And they were very surprised by that. They expected that not to be the case. But they had some weird configuration issues, and they made a hypothesis, they made an assumption that the builds are run the same way, but without having the data that tells you exactly how many threads have been used, it’s hard. 

Human beings are not as good at interpreting what Maven or Gradle does than the tool itself. So rather capture the data from the tool than rely on speculative assumptions from people. No, the builds are running the same way. I haven’t changed anything. Oh no, my memory is one gigabyte. Well, maybe you have misspelled Maven or Gradle ops and your memory is actually only 128 megabyte. You get the idea.

But even better than helping people with incidents is to avoid incidents. And one thing I want to show you is, so developers very often struggle, especially with failing CI builds, to understand, is that my change or is it something else that has changed? I do not know why this build has failed. So even when it’s a verification failure, they cannot reason enough about it, so they escalate it or they file a support ticket.

So there is one beautiful thing you can do when you capture the data of every build all the time. First of all, here’s the failing CI build, so we look at the build scan and it gives us a lot of information, but we still don’t know why it has failed. But what we know is, hey, it has succeeded previously, and we think nothing has changed. So let’s say, locally, I run the build with exactly the same commit ID and it worked. So when you have all the data, you can compare. And now, we compare the builds and we can see, oh, there’s a difference in dependencies. The CI build was using version 1.6 and the local build was using version 1.5. Without such a capability, it’s so hard for developers to reason about it. So they file issues, or they delete every cache in the universe and then run the build again to see if it’s now working. You get the idea. Now, they can click on 1.6 and they can see, ah, there are the dynamic dependencies and the CI build just picked up a later version. So no incident would be filed.

The other thing I wanted to show is, how can you proactively do performance optimizations and how can you proactively do reliability improvements. So one thing we can do when we have all the data is to do a failure analysis. Here, we analyze failures, I think over the whole history of data that we have collected, you can look at all failures, verification, and non-verification failures. So we classify them. We analyze the failures and classify them as verification or non-verification failures. So now, you can look at all the non-verification failures’ reliability issues. So you can now look at some of them.

For example, we have an interesting one here, this one here. When we look at this particular failure, we can see, wow, this has affected quite a few builds. So it didn’t occur, then it occurred, and then it mostly went away. And you can now reason about it. You can see, OK, 13 users were affected by it. All of those failures only happened when doing local builds, 500 in all, and none of them when doing CI builds. So this is a failure that only affects local builds. And then we looked at the timeline. We looked at a couple of other things. And then what we could find is we have removed the sub-project, the announce project. The announce plugin will no longer be in Gradle 6 at all. But we had many people in the build configuration assuming that this still exists and using the announce plugin. So we removed something and our developers always use the latest version of Gradle, and then we saw, now, we have a completely new type of build failures. And then we informed them and we did some communication, and now, almost all those failures are gone. But the beautiful thing, if you analyze the data, you can immediately see how many people are affected, what types of builds are affected.

So, for example, here, we analyzed the build performance of the last four weeks. So our average build time is 2 minutes and 50 seconds. Because of build avoidance, we saved 16 minutes and 40 seconds per build. So, let’s say, if Gradle were built with Maven without a build cache, our builds would take 20 minutes on average. But then you can do interesting things, like, let’s say, dependency downloading. On average, we spend per build two seconds on dependency downloading, which is good. Our average cache overhead per build is four seconds. But you can do some interesting stuff. You can go to our performance diagnostic board and you can say, hey, show me the last 10,000 builds and only focus on the dependency downloading time. And then, we can now look at builds and see, wow, here, we had a built were, actually, we spent 45 seconds on dependency downloading. So now, we can go to the build scan. We can go to performance. We can go to network activity, and we can learn, OK, we downloaded quite a bit of dependencies, 240 megabytes, but also, the bandwidth was very slow. So let’s say someone of your people complains about a slow build. Very often, it’s dependency download issues. So you now can actively analyze that and can see, were there any downloads? How fast was the bandwidth? So, again, you need the analytics to see where is interesting stuff to dive in deeper, and then you need a comprehensive set of data to reason about the root cause for a certain behavior.

Yeah, here is another example. We just got this also from a customer. What you can see here is, it’s hard to read, but they had a new incident occurring and only one user was affected. The CI user. The Jenkins user. So they immediately know, this is only a CI problem. And what happened, basically, was that the infrastructure team changed their Docker configuration, and they had no idea that this was affecting and introducing errors on CI builds. And the beautiful thing is, when you have the data, you immediately see, is it a one-off thing or is it a trend? You don’t have to wait until things get escalated. You can proactively monitor. Let’s say someone is having a new exception. You can search. When someone is having an exception, you can paste the error message from the command line, and we look for exceptions with similar messages to see, is this a one-off thing or is this happening to two or more people?

So one of the big problems when it comes to reliability is flaky tests. So that is a problem that is affecting most, I don’t know an organization that doesn’t have that problem. We are currently working on a solution. The plan is to release this in Q4 2019. We will keep you informed. Everyone who has signed up for the webinar with new features, we’re releasing. So we’re super excited about that. That is a huge problem. We had to create a custom solution for our own engineers, and we are now super excited that we can productize something that can really help with that.

Yeah, so if you’re interested in learning more, we do a hands-on workshop where we basically have similar content like this presentation, but we do this together with labs and exercises that you can do to really get a deeper understanding of how to solve certain of those problems. So, if you’re interested, you can sign up with this URL. In the follow-up email, we will also mention that.

So more resources on that topic you can find here. We are working also on a book, called Developer Productivity Engineering, where we hope that we will release the rough draft in several weeks. So we will also inform you via email about that.

Thank you very much for your time. We covered a lot of ground. But any more questions from the audience?

“Is it possible to run tests only for code that you touched? Something like build optimization to speed up build?”

Yes, that is what the build cache is giving you. So, if you think about it, if you have a multi-module build, you have exactly that effect. You change module 10. There are no dependencies on modules one to nine, and then we will not run the tests for those modules one to nine and get the results from the build cache. So, if you think about it, the build cache is providing you exactly that kind of incrementality. But, of course, it is heavily affected by how well-modularized your code is. If all the tests live in one module and all your source code lives in one module, then it will not provide you that effect.

What I’ve never seen implemented successfully or used successfully is some incremental class analysis where you try to figure out from the class dependency craft, which changes in the test affects which part of the code base and then which tests need to be run. So my recommendation is, use the build cache and improve the modularization of your code. Then you will get the majority of those benefits from that.

“Running CI builds with the build cache on. Is it safe? Right now, I do everything I can to ensure that the build is clean.”

Yeah, it’s absolutely safe. Some of the organizations with some of the most critical software stacks in the world are using a build cache. The build cache is safe because it only works for the tasks that are declared cacheable and where the input is properly described. So absolutely, you should use it on CI. Gradle is delivered that way. I don’t want to scare you, but I don’t think we have ever had any issues with that. And I mean, Google is using build caches. Airbnb, big banks. It’s absolutely safe, yes. But the key thing is, then, also have monitoring in place that if there are issues, you’re able to debug them.

“Is there a good resource for best practices for these variables we have discussed? Guidance on the architecture principles.” 

We are working on that. We have some resources. This is a work in progress. We will keep you up to date.

Yeah, there’s quite an interesting discussion about distributed builds. “Will distributed testing be an Enterprise feature?”

Yes. It will be an Enterprise feature, yes. It will work for Gradle and for Maven. So it lives on a different level and it goes together with flaky test detection.

OK, someone else said their biggest problem is build immutability. So that is a good point. I think the practice of build caching will also push you to build immutability because I could show you amazing things you can do with the data to discover, to discover volatile input. So build immutability and build caching are close friends. Let’s put it like that. And so we can definitely help you with that.

Thanks to everyone. We appreciate your time. Please contact us on Gradle.com if you have any questions. Thank you, bye.