How useful are code coverage reports?

Nov 5, 2022 · 5 min read

This post is going to talk about code coverage reports from unit tests. However, to provide some context first, I’m going to reminisce about 2010.

A story

Back in 2010, I was working for Plusnet, an ISP in the UK (Referral link). The engineering team were still getting to grips with unit testing our code. What we saw time and time again was many tests being written, high coverage reports, and management was happy. However, if you checked out a test, we saw that tests were running, and executing code, but they were not asserting anything. This meant the tests were more of a linter. So I wrote the --assert-strict feature in PHPUnit (which has subsequently become fail-on-incomplete). This helped us raise the quality of our tests at Plusnet.

Why mention this? Opinions mostly come from experience, and I wanted to share mine before talking about this.


I value coverage reports. It’s a tool in the toolbox, it is not the tool in the toolbox! It is also not perfect (nothing is).

When I started a new job this year I found the project had thousands of tests, which is lovely. As a new engineer, I wanted to see what that looked like in terms of coverage. Did we have thousands of tests and high coverage or thousands of tests and low coverage? Why does it matter? For me, it gives me the confidence to make changes in the code. Once I have the confidence, it unlocks the ability to move at pace.

Getting the coverage report is not easy, as we lack test suites to help figure this out. I briefly mentioned we could work on this, and a colleague mentioned that coverage reports aren’t helpful, and folks end up chasing the metric.

At this point, I want to agree with the latter statement. I value coverage reports, but I do not think you should chase unrealistic metrics.

I value coverage reports, but I do not think you should chase unrealistic metrics.

Many testing tools allow you to ignore code in coverage reports. I’m genuinely against this concept because I want to gain confidence. I’m personally not after 100%. I’m after the highest coverage you can get, for the maximum benefit, and no more. If there is a gap in coverage, I’m fine with that, as long as it’s clear and upfront.

It’s also worth noting that bikeshedding over the last 5% of code coverage, is more than likely not going to add overall value. You will get diminishing returns, in my opinion.


So, back to coverage reports. Line coverage is handy, but try and get a branch coverage report. This shows which paths through the code are tested. That’s two metrics that can and should be used with many other metrics you can get from your code, e.g. Cyclomatic Complexity, and Static Analysis. No single metric can tell the entire story. Let’s take a well-known example from the past that no one uses anymore. If you judged a software engineer on one single metric: lines of code written, you are not getting a realistic picture of that engineer!

Software is no different, you need to weigh all these metrics together. Each metric provides a different lens to look through in terms of quality.

So, meh, does it really matter?


Well, I would argue that if you don’t have visibility of what code your tests are covering, you’re now going to rely on monitoring and analytics platforms to identify quality issues with your code. When we say monitoring and analytics platforms, we actually mean our users. Our users are doing things on our platform, and the monitoring systems alert the engineers when things fail.

This week, as a relevant anecdote, we saw some notifications from Rollbar that showed us something was wrong with a new feature. We managed to get a coverage report for a single module of our code (took a little while), and we saw that the bugs identified in Rollbar were actually not covered by tests.

I personally want to know about this in development, not from our users, via Rollbar. It’s more costly to fix it for one thing, and it’s also impacting our customers, which isn’t great. It’s also stressful working in an environment where you know you have production issues impacting real people.


In conclusion, this is how I would summarise my views on this.

See also