Developer Productivity Engineering Blog

Identifying and analyzing flaky tests in Maven and Gradle builds

Flaky, or non-deterministic, tests are a serious and prevalent problem in modern software development. If your application interacts with browsers, external devices or services, or has asynchronous behavior, it’s likely you have suffered from flaky tests. Martin Fowler has this to say about flaky tests:

Non-deterministic tests have two problems, firstly they are useless, secondly they are a virulent infection that can completely ruin your entire test suite. As a result they need to be dealt with as soon as you can, before your entire deployment pipeline is compromised. — Martin Fowler on Eradicating Non-Determinism in Tests

Flaky tests compromise deployment pipelines by slowing them down and decreasing confidence in the correctness of changes. When changes regularly fail due to unrelated flaky tests, time is wasted, features are delayed, and developers are demotivated to make changes.

Develocity 2019.5 introduce tools for identifying and analyzing flaky tests, making it easier to take control of this problem and eradicate flaky tests.

This functionality adds to the recently-introduced Tests Dashboard as well as build scans for Gradle and Maven.

Identifying flaky tests

Develocity considers a test flaky if it fails and then succeeds within the same Gradle task or Maven goal execution. Any such tests are now indicated as FLAKY in build scans for Gradle and Maven.

Build scan with flaky test

This requires retrying failed tests, which is a simple, effective, and immediate way to identify flaky tests.

Common JVM test execution frameworks such as JUnit and TestNG provide mechanisms for retrying tests, typically requiring extra code to annotate tests that are known to be flaky. Enacting test retry in the build does not require code changes and applies to your entire test suite. A key benefit this enables is proactive detection of newly introduced flaky tests.

Maven’s Surefire and Failsafe test execution plugins allow retrying failed tests. While Gradle does not provide this functionality out of the box, the new Test Retry Gradle plugin can be used.

Test Retry Gradle Plugin

This plugin is developed by the Gradle team and is available from the Gradle plugin portal. It can be used with or without Develocity.

plugins {
    id "org.gradle.test-retry" version "1.0.0"
}
test {
    retry {
        failOnPassedAfterRetry = true
        maxRetries = 1
        maxFailures = 42
    }
}

To get started with the plugin or to learn more, take a look at the project on GitHub.

By default, the plugin considers a test to have passed if it passes after being retried (this can be changed by setting failOnPassedAfterRetry = true as above). While this dulls some of the pain of flaky tests in that they will now rarely fail builds, it is not a complete solution. Flaky tests will go unnoticed, and you will inevitably accrue more flaky tests. You must still identify and fix flaky tests. The Develocity Tests Dashboard helps with exactly this.

Analyzing flaky tests

The Develocity Tests Dashboard now visualizes the most severe flaky tests across your builds, making it much easier to measure the problem and prioritize fixing efforts by clearly identifying the worst offenders.

The Tests Dashboard is available in 2019.5 as a partial preview of a larger set of testing-oriented functionality that will be available as an add-on package in upcoming Develocity versions. Depending on your usage license, this new functionality may not be available to your installation when it is no longer in feature preview. If you have questions regarding this matter, please contact Develocity support.

The default view shows the test classes that most often have a flaky test in a build. From here you can drill into a class to see its offending tests, and then to recent build scans of builds where the test was executed.

You can also analyze specific classes or groups of classes (e.g. packages) via the search field.

Visualize most flaky tests

The over time visualizations also allow you to monitor the resolution of a flaky test, confirming whether an attempted fix really resolved the flakiness.

By routinely using the Tests Dashboard to identify the worst flaky tests in need of action, you can start to take control of the situation and start reducing the problem. If you’re fortunate enough to not be burdened with many flaky tests, you can use the Tests Dashboard to ensure things stay that way.

Non-determinism in tests needs to be eliminated

We hope the new Test Retry Gradle Plugin and new flaky test analysis features in Develocity 2019.5 help you find and fix flaky tests quickly.

If you’re interested but not already using Develocity, you can try it for free by requesting a trial.

Stay tuned for more blog posts about new analytics features coming in each Develocity release.