How useful are code coverage reports?
Nov 5, 2022 · 5 min readThis post is going to talk about code coverage reports from unit tests. However, to provide some context first, I’m going to reminisce about 2010.
A story
Back in 2010, I was working for Plusnet, an ISP in the UK (Referral link). The engineering team were still getting to grips with unit testing our code. What we saw time and time again was many tests being written, high coverage reports, and management was happy. However, if you checked out a test, we saw that tests were running, and executing code, but they were not asserting anything. This meant the tests were more of a linter. So I wrote the --assert-strict
feature in PHPUnit (which has subsequently become fail-on-incomplete
). This helped us raise the quality of our tests at Plusnet.
Why mention this? Opinions mostly come from experience, and I wanted to share mine before talking about this.
Tools
I value coverage reports. It’s a tool in the toolbox, it is not the tool in the toolbox! It is also not perfect (nothing is).
When I started a new job this year I found the project had thousands of tests, which is lovely. As a new engineer, I wanted to see what that looked like in terms of coverage. Did we have thousands of tests and high coverage or thousands of tests and low coverage? Why does it matter? For me, it gives me the confidence to make changes in the code. Once I have the confidence, it unlocks the ability to move at pace.
Getting the coverage report is not easy, as we lack test suites to help figure this out. I briefly mentioned we could work on this, and a colleague mentioned that coverage reports aren’t helpful, and folks end up chasing the metric.
At this point, I want to agree with the latter statement. I value coverage reports, but I do not think you should chase unrealistic metrics.
I value coverage reports, but I do not think you should chase unrealistic metrics.
Many testing tools allow you to ignore code in coverage reports. I’m genuinely against this concept because I want to gain confidence. I’m personally not after 100%. I’m after the highest coverage you can get, for the maximum benefit, and no more. If there is a gap in coverage, I’m fine with that, as long as it’s clear and upfront.
It’s also worth noting that bikeshedding over the last 5% of code coverage, is more than likely not going to add overall value. You will get diminishing returns, in my opinion.
Metrics
So, back to coverage reports. Line coverage is handy, but try and get a branch coverage report. This shows which paths through the code are tested. That’s two metrics that can and should be used with many other metrics you can get from your code, e.g. Cyclomatic Complexity, and Static Analysis. No single metric can tell the entire story. Let’s take a well-known example from the past that no one uses anymore. If you judged a software engineer on one single metric: lines of code written, you are not getting a realistic picture of that engineer!
Software is no different, you need to weigh all these metrics together. Each metric provides a different lens to look through in terms of quality.
So, meh, does it really matter?
Importance
Well, I would argue that if you don’t have visibility of what code your tests are covering, you’re now going to rely on monitoring and analytics platforms to identify quality issues with your code. When we say monitoring and analytics platforms, we actually mean our users. Our users are doing things on our platform, and the monitoring systems alert the engineers when things fail.
This week, as a relevant anecdote, we saw some notifications from Rollbar that showed us something was wrong with a new feature. We managed to get a coverage report for a single module of our code (took a little while), and we saw that the bugs identified in Rollbar were actually not covered by tests.
I personally want to know about this in development, not from our users, via Rollbar. It’s more costly to fix it for one thing, and it’s also impacting our customers, which isn’t great. It’s also stressful working in an environment where you know you have production issues impacting real people.
Summary
In conclusion, this is how I would summarise my views on this.
- I value a holistic set of data points that help us understand quality in software development.
- Code coverage is a single metric that can be part of that set of metrics you monitor.
- No single metric can stand by itself, and be meaningful.
- Nothing is perfect, which is why we should value a toolbox.
- I don’t believe in gaming the system and “hiding” uncovered code to get to 100%.
- You need engineering teams who are prepared and confident enough to publicly share their coverage reports. This sets the tone of the culture.
- Context is needed, always.
- There will be reasons why the coverage is as it is.
- Use tools that help engineering teams with confidence/delivering at pace and ultimately delivering customer satisfaction.
- You cannot compare reports from different teams or projects.
- More than likely the same with most metrics.
- Striving to improve is a mindset we should engage with, so use tools that help that improvement.