Developer Productivity Engineering Blog

5 Ways to Use Develocity to Identify and Manage Flaky Tests

Dealing with flaky tests is a significant challenge in software development. These unpredictable and inconsistent tests can pass or fail without any changes in code, casting doubt on the reliability of your toolchain and ultimately on the application itself. The presence of flaky tests can significantly impact developer confidence and productivity. To better understand why you need to address these flaky tests, read Seven Reasons You Should Not Ignore Flaky Tests.

Fortunately, tools such as Develocity can aid in tracking down and mitigating these issues. Here are five ways you can use Develocity to better manage flaky tests.

1. Flag Potential Flaky Tests: Develocity Test Failure Analytics provide a streamlined way to flag flaky tests. By monitoring a build’s test results, it identifies potential flaky tests—ones that fail initially and pass when re-run. You can see this data in the Tests dashboard—whenever you look at the test results, you can see whether you have flaky tests in your builds.

You can sort the list of test results by ‘flakiness’ to identify the tests that have the least consistent test results and use this to prioritise any action.

2. Investigate Test Failures: Once you’ve identified a flaky test, you’ll want to find out why it’s failing. Develocity Build Scan® provides a detailed examination of each build and test run, delivering a granular view of the data. The Build Scan will show you why a test was flagged as flaky.

From a comprehensive set of options, you can use the Build Scan to drill down into details about the tests’ environment, inputs, and outputs, which may give you clues as to why it unpredictably fails.

3. Leverage Historical Data Analysis: One of the primary challenges with flaky tests is the difficulty in reproducing issues for analysis. Develocity tracks all test executions over time, which enables you to compare different runs. With this data, you can see when and where the test passed or failed, under what conditions, and in which context. This can provide the vital clues needed to understand the circumstances under which tests fail.

You can also use this history to find out when the test started to become flaky or to check if you have in fact fixed a flaky test.

4. Collaborate: Develocity also supports efficient collaboration. After identifying and analyzing a flaky test, you can share the findings with your team using shareable Build Scan links that pinpoint information at the line-item level you want others to view. These links ensure everyone is not only on the same page, but the precise and relevant part of the page, fostering collaborative problem-solving and knowledge-sharing.

5. Monitor Continuously: Finally, after addressing the flaky tests, Develocity offers continuous monitoring to ensure they don’t reappear. You can use this to keep a keen eye on the health of your tests over time, alerting you to any potential regressions.

In conclusion, Develocity is a valuable tool for dealing with flaky tests. Its ability to identify, investigate, and monitor inconsistent tests, paired with its collaborative features and historical data analysis, makes it a comprehensive solution for maintaining the integrity of your testing suite. Remember, catching and addressing flaky tests early on is essential for delivering high-quality software, and tools like Develocity can play a vital role in this process.

Next Steps

In the first blog post in my series on flaky test insights and best practices, I explained why it’s critical to address flaky tests and not ignore them. By now, you have some ideas about how to identify and track flaky tests leveraging Develocity tooling. Stay tuned for the final post in this series where I provide more specific tips on how to fix flaky tests.

In the meantime, you can learn more about Develocity Failure Analytics, which not only provides Test Failures Analytics (and flaky test management capabilities in particular), but also Build Failure Analytics, which addresses non-test-related failures like compilation problems.