Developer Productivity Engineering Blog

Seven Reasons You Should Not Ignore Flaky Tests

Imagine this scenario. You have good automated test coverage of your application code, you run your tests locally, and you have a Continuous Integration (CI) environment which runs your tests regularly. You’re doing everything right—right? Except… sometimes tests fail and you’re not sure why—whether they pass or fail doesn’t seem to be related to any code changes. Over time, your team learns these are the “flaky tests”, and begins to ignore them when they fail. But ignoring test failures can ultimately result in the development of a lower-quality product fraught with uncaught bugs. This post explains why you should care about flaky tests and what you can do to better diagnose failures and increase your team’s confidence in tests.

Why should we care about flaky tests?

Flaky tests may seem like a minor inconvenience—we often learn to identify which tests occasionally (or frequently) fail for no good reason and pay them less attention. However, it’s crucial to understand their impact on the broader development process, because a flaky test both consumes valuable developer time and creates a sense of uncertainty around the testing suite.

Ideally, tests should offer a safety net, giving you reliable direction to make changes to the code. However, when tests flicker between pass and fail states, they become more a cause of confusion than a useful signal, diluting their utility and weakening their credibility.

Worst of all, ignoring test failures can result in the development of a lower-quality product fraught with uncaught bugs.

Benefits of addressing flaky tests

Addressing flaky tests can have a profound positive impact on your team and product. Here’s how:

1.  Paves the way to improve individual developer productivity

Flaky tests introduce a level of unpredictability that can drag developers out of our state of flow, limit creative productivity, and slow the pace of development. On one build a test passes, on the next it fails, with no relevant changes made to the codebase in the interim. This inconsistent behavior can create a fog of confusion, leading us down time-consuming rabbit holes trying to figure out where we went wrong with our seemingly unrelated code changes. But when we have confidence in our tests, then we also have confidence that time spent troubleshooting is time well spent.

2.  Results in more time to do what we do best

Instead of focusing on more constructive (and more interesting) activities like creating new features or refining existing code, flaky tests can introduce phantom problems, which in turn drain our time and energy. Or we might spend time looking for bugs that should have been caught by the automated tests, but were ignored because of prior flaky results. When we have less flaky tests taking up our attention, we can spend that time and energy on more fulfilling tasks.

3.  Restores confidence in the tests

Reliable tests are a developer’s ally. When there are no flaky tests, faith in the test suite is restored. And with this renewed trust, we can fearlessly modify the codebase, knowing that the test suite will catch any bugs or issues that we inadvertently introduce with our changes.

4.  Boosts team morale

Flaky tests can be a persistent source of frustration, which can ultimately lead to a decline in team morale. When erratic tests are eliminated, the development process runs much more smoothly resulting in a more motivated and happier team.

5.  Makes better use of resources

Intermittently failing tests may be taking more avoidable cycle time, locally and in CI. They may be timing out. They are likely being re-run (sometimes more than once) to check if they’re really failing. Eliminating this flakiness should mean you’re running fewer tests, and probably fewer builds, locally and in CI.

6.  Reveals hidden issues in production code

While flaky tests may seem like a nuisance caused by poorly-written tests, they can sometimes be the canary in the coal mine, indicating deeper issues within the production code. Fixing flaky tests can reveal subtle, previously unnoticed bugs or opportunities for optimization.

7.  Improves software quality

Addressing flaky tests not only improves the development process but also enhances the overall quality of the software. When a reliable test fails, we can identify the cause of that failure and fix it, ensuring the final product’s stability and dependability. A higher-quality product leads to increased user satisfaction and trust.

Fixing flaky tests should be a priority if you want to avoid undermining your mission-critical testing efforts and investments. Why? A flaky test is worse than no test at all. Either the test fails for no good reason and we waste time looking for problems with our code that don’t exist, or the test fails for a genuine reason and we simply ignore it because of its flakiness, which could have real consequences for the quality of our product.

Taking steps to address flaky tests

As I’ve described, flaky tests can have a profound impact on developer productivity and software quality. By taking the time to identify and fix these tests, we can unlock a host of benefits—from time and resource optimization to discovering hidden issues in the production code. Prioritizing and fixing flaky tests should be an essential part of any software development strategy.

Develocity has multiple capabilities for managing flaky tests. Here are some next steps you and your team can take to start addressing flaky tests right away: