Just like failing tests, flaky tests indicate that something might be wrong with the user-facing behavior of your product. It is important to track and fix them as soon as possible. Develocity 2020.2 features test analysis improvements that streamline test debugging and root cause identification. They include:
- A redesigned test result view that provides quick access to relevant contextual information such as recent test history, and other failed/flaky tests from the same build
- Colocation of all executions of a single test to make output and exception comparisons simpler
- Stacktrace analysis that shows only stackframes relevant to the current context
In this post, we will explore each of these improvements in detail by debugging a real flaky test in Gradle’s own build.
Investigating a spike in flaky tests
Gradle is used by millions of developers, so it’s especially important that we fix flaky tests before they start causing you build failures. During the Gradle 6.4 release candidate cycle, I discovered a concerning spike in flaky tests using the recently-introduced Tests Dashboard.
The Tests Dashboard lists the test classes that report the most flaky outcomes. I start my investigation by choosing the top one in the list which is much flakier than others.
Looking at the outcome history for this test, I can see that it was introduced within the past week and was flaky from the start. These are exactly the kinds of things we want to catch early, so I start investigating immediately.
Flaky test analysis
From the test history, I click the most recent flaky test result and the revamped test result page in the build scan is presented.
I can see the relevant part of the test name right at the top with some contextual information underneath. One thing that catches my attention immediately is that this is not the only flaky test in this build. Perhaps tests are interfering with one another, so I click “2 flaky” to see which tests were flaky.
There’s definitely something here. I go back to the test in question to take a deeper look.
Next, I see that this test was executed two times: the first execution failed and the second passed. I scroll down to the test output to see if I can figure out where it failed.
Then I click the arrow next to the test execution header to scroll to the next execution to see how the output differs.
I notice here that the “Calculating task graph…” line is at the end of the passed execution, but at the beginning of the failed execution. The passed execution was also much quicker than the failed one. I note these irregularities in case they are helpful with root cause identification.
I’m ready to start investigating the code, so I scroll to the root exception of the failing test.
The summarized stacktrace shows that
AbstractUndeclaredBuildInputsIntegrationTest is related to this failure. I discover that several test classes extend this class, and all of them have reported flaky results recently.
I reach out to the team with the evidence I’ve collected. After a short while, we discover and fix a test that wasn’t properly cleaning up; since that time we’ve seen a sharp decrease in the number of flaky tests.
In the end, we eliminated a major new source of flakiness. Luckily, this time the cause was a misbehaving test and not a user-facing bug with the potential to cause a lot of user pain and frustration. More importantly, the noise from these flaky tests no longer hinder us from catching new spikes in flakiness.
We hope that these improvements to Maven and Gradle build scans in Develocity 2020.2 help you quickly eliminate flaky tests from your test suite. Drop us a line if you have feedback or want to give this feature a spin. Until next time!