CI: code coverage

It would be awesome if the CI could report the amount of code covered by the tests. We could start with native first. Collecting coverage statistics on hardware might be a little more challenging.

1 Like

Does highly optimized C code even give any sensible mapping to code
coverage? I already have trouble interpreting coverage reports in Python
(eg. does the condition in this list comprehension ever get triggered?);
in a RIOT program with inlining dead code elimination, there is even the
intermediate question of “Does this even wind up in the binary?” before
the question of “Is it reached at execution time?” makes sense.

(Not saying it can’t work, just curious to see how well it does.)

Does highly optimized C code even give any sensible mapping to code coverage?

@cladmi and I spit-balled some ideas around coverage testing way back. Basically, the conclusion was that you most likely had to combine the reports of several, if not all applications to get a sensible picture. Optimized code is our last problem when it comes to coverage I think, as RIOT’s modularization yields so many compilation paths already in some cases.

From my experience code coverage anyhow only gives a rough pointer if some parts of the codes are not covered at all. Whether one has 50% or 95% code coverage doesn’t tell you anything about the quality measurements in place.

2 Likes

Yup. But I think that number is more interesting when restricted to specific modules. As in, 50% of the total code base code coverage is of limited use as a number, but knowing that 99% of a cbor parser module is covered means that the tests are somewhat exhaustive, for that module.

Also, relative changes are meaningful. If a new module is added, but code coverage goes down, maybe the tests should be extended.

Then again, code coverage is quite multi dimensional in our case. If 50% of the libraries are covered for native and the other 50% on Coretex-M, do we have 100% coverage?

I meant also that code coverage doesn’t say anything about test quality. I can have 100% coverage for a CBOR parser with hardly any meaningful test. But I think that some visible numbers (maybe using some traffic light system) would still look good (or bad?) to outsiders.

I meant also that code coverage doesn’t say anything about test quality.

Really? I’d agree on saying that code coverage doesn’t guarantee higher test quality. But knowing which parts of the code are actually executed in itself is valuable in itself, if not to guide creation of more exhaustive tests.

Tests can never show the absence of bugs, only their presence. But using code coverage to guide writing tests that reach more branches in code is extremely useful. High code coverage hints at that being done.

I bet if we’d close the holes (or even just reduce) in our code coverage, we’d hit less bugs “in the field”.

functions that are not covered should give us reasons to be highly warned.

functions that are covered should not give us reasons to relax.

@JulianHolzwarth recently had nice experiences where analyzing code coverage helped to improve sw quality in RIOT.

cheers matthias

@JulianHolzwarth recently had nice experiences where analyzing code coverage helped to improve sw quality in RIOT.

Do you have a pointer to the results ? That would be an interesting information.

I meant also that code coverage doesn’t say anything about test quality.

Really? I’d agree on saying that code coverage doesn’t guarantee higher test quality. But knowing which parts of the code are actually executed in itself is valuable in itself, if not to guide creation of more exhaustive tests.

Well, a test where the assertions/expectations contradict the documented behavior might still cover 100% of the code, but give 100% wrong results.

I just want to strongly discourage from chasing numbers. In all the projects I got to know during the last years where the customer/tech lead/project manager set a certain percentage goal for code coverage the effort to reach this goal was high and the number of identified bugs that would have been missed otherwise was low. The situation probably improves if you set this number right from the beginning - in the context of RIOT, e.g., when some develops a new module.

P.S. Splitting this into a new topic makes it hard to reply via mail. At least I don’t know how to do it.

I agree. Having a 100% code coverage doesn’t mean that tests are correct or are covering all use cases. In riot-generator, I could reach the 100% coverage but was still able to find uncovered cases: each block were (well?!) tested independently but not necessarily when they were put together.

Anyway, I still think that code coverage gives a rough view on how the code is tested. This is more a global indicator on the amount of tests but of course, it should not be blindy trusted.

i suppose he will report about that soon on GitHub.

cheers matthias

P.S. Splitting this into a new topic makes it hard to reply via mail. At least I don’t know how to do it.

isn’t each post getting its own reply-to, so just hitting reply does the right thing? (trying with email)