Analyzing unit-ness of white-box tests using OpenCover

In this article, I will run an analysis over Roslyn codebase to demonstrate difference between unit and integration tests. At the beginning, I need to explain some nomenclature problems I observed in the world… I promise there will be some code after all.

Oh, by the way, this blogpost is a part of the First C# Advent Calendar. Check out other texts there!


There is a class of tests that share following signs:

  • We write them in our IDEs, usually in the same language we are writing our code.
  • We write them against our classes/functions, not against the network interface of a service we are writing.
  • We run them during CI, but also from our IDE, sometimes very often.
  • The way they are ran is orchestrated by tools like NUnit, xUnit, MSTest, …

They are really very common and although you might dispute particular characteristics as I described them (“I test C# code using F#!”), you know very well the concept I’m describing.

These are not unit tests, not necessarily. You can test database integration in this way and it’s totally not an antipattern to run a test suite against a real-database backed repository layer.

They are called “unit tests” at many places though:

For lack of better word, I’ll call them white-box tests because their crucial property is that they are ran from inside of the codebase and they see and test internal structures of the code. I think it would be much better to call NUnit “white-box test framework” than “unit test framework”, but there is even better name out there somewhere I’m sure.

To the point

The reason this was not nitpicking is that it’s really important to see this difference when reasoning about testing.

We can use it to rehabilitate tests that are not unit-y, but that are still useful to run against abstractions not visible from outside. Let’s return to the case of repository layer: if it’s written in Dapper, there are not many other choices how to check it does what it should, than to run it against a real database.

By saying “let’s make sure we check all database interactions by calling proper API endpoints in integration tests”, we are losing opportunity to harden repository interfaces. Even if we win battle with combinatorial explosion that happens in integration testing, we will get repositories that are guaranteed to work for a particular set of situations only, breaking as soon as caller does something slightly different with the interface offered.

Even more to the point of this article: after admitting that “unit test runners” run all kinds of tests, we can now analyze how unit-y our so-called “unit tests” actually are. With some amount of simplification, we are going to say that tests that cover large parts of SUT are integration-y and tests that cover only few lines are unit-y.

Can we measure coverage of test runs in a way that would disclose to us whether is the codebase covered with a single integration test spanning its execution across thousands of lines, or many unit tests checking just a few places? Sure! That’s exactly what’s going to be done here.

Digging into the output of OpenCover

OpenCover is an open source tool for computing code coverage of .NET software. You let it observe test suite run and it will produce lengthy XML file with an analysis of the run: besides usual metrics like code coverage percentage it mentions every visited sequence point, something that very roughly corresponds to line of source code in the SUT.

It has a command line argument coverbytest that enhances standard output with notes on what sequence point was visited by what test, fully describing the whole bipartite graph of the line-visited-by-test relation we are interested in. The relevant part of the output then looks like this:

          <Summary numSequencePoints="49" visitedSequencePoints="41" numBranchPoints="18" visitedBranchPoints="14" sequenceCoverage="83.67" branchCoverage="77.78" maxCyclomaticComplexity="9" minCyclomaticComplexity="1" visitedClasses="1" numClasses="1" visitedMethods="4" numMethods="4" />
            <Method visited="true" cyclomaticComplexity="9" nPathComplexity="32" sequenceCoverage="83.78" branchCoverage="81.82" isConstructor="false" isStatic="false" isGetter="false" isSetter="false">
              <Summary numSequencePoints="37" visitedSequencePoints="31" numBranchPoints="11" visitedBranchPoints="9" sequenceCoverage="83.78" branchCoverage="81.82" maxCyclomaticComplexity="9" minCyclomaticComplexity="9" visitedClasses="0" numClasses="0" visitedMethods="1" numMethods="1" />
              <Name>CSharpFormatting.Common.AnnotationResult CSharpFormatting.Parsing.Roslyn.CSharpParser::Parse(System.String,System.String)</Name>
              <FileRef uid="55" />
                <SequencePoint vc="18" uspid="307" ordinal="0" offset="0" sl="15" sc="9" el="15" ec="10" bec="0" bev="0" fileid="55">
                    <TrackedMethodRef uid="10" vc="7" />
                    <TrackedMethodRef uid="17" vc="1" />
                    <TrackedMethodRef uid="4" vc="1" />
                    <TrackedMethodRef uid="18" vc="1" />
                    <TrackedMethodRef uid="2" vc="1" />
                    <TrackedMethodRef uid="16" vc="1" />
                    <TrackedMethodRef uid="13" vc="1" />
                    <TrackedMethodRef uid="7" vc="1" />
                    <TrackedMethodRef uid="6" vc="1" />
                    <TrackedMethodRef uid="15" vc="1" />
                    <TrackedMethodRef uid="9" vc="1" />
                    <TrackedMethodRef uid="12" vc="1" />
                <SequencePoint vc="18" uspid="308" ordinal="1" offset="1" sl="16" sc="13" el="16" ec="49" bec="2" bev="1" fileid="55">
                    <TrackedMethodRef uid="10" vc="7" />
                    <TrackedMethodRef uid="17" vc="1" />
                    <TrackedMethodRef uid="4" vc="1" />
                    <TrackedMethodRef uid="18" vc="1" />

It means that the sequence points #307 from the SUT was visited by test #10 (seven times), #17 (once), #4 (once), …

(You see I ran it first at my little project CSharpFormatting that displays C# code in HTML with IntelliSense-like tooltips. Coming soon.)

Analyzing C# compiler codebase

This is exactly the kind of information we need. Running it against my toy codebases is not as fun as seeing it in action on something bigger. Roslyn source code is a good candidate: it’s a big, production-quality open-source project written by reasonable people. The domain they are solving is not some fuzzy entreprisey integration of iframes with credit card data: it’s a compilation, fundamentally a very testable process.

Roslyn is quite easy to build and test locally: just clone the repo and if you meet some minimal system requirements, you can run Restore.cmd, Build.cmd and Test.cmd (or their Linux counterparts) in a quite a quick succession. It seems easy to wrap the last invocation to get OpenCover data:

OpenCover.4.6.519\tools\OpenCover.Console.exe -register:user -target:"./Test.cmd" -filter:"+[*]*" -coverbytest:"*" -output:opencoveroutput.xml

Alas, this fails because PDBs are not generated during standard Roslyn build. We need to do two things:

  • Set the infrastructure to generate them using change like this:
--- a/build/Targets/Settings.props
+++ b/build/Targets/Settings.props
@@ -122,8 +122,8 @@
       Developer build:
        - Embed PDBs to be consistent with Jenkins builds.
-    <RoslynDebugType Condition="'$(OfficialBuild)' == 'true'">portable</RoslynDebugType>
-    <RoslynDebugType Condition="'$(OfficialBuild)' != 'true'">embedded</RoslynDebugType>
+    <RoslynDebugType Condition="'$(OfficialBuild)' == 'true'">full</RoslynDebugType>
+    <RoslynDebugType Condition="'$(OfficialBuild)' != 'true'">full</RoslynDebugType>

       The source root path used for deterministic normalization of source paths.

(Thanks for hint goes to the Roslyn team helping another random stranger trying to analyze their software!)

  • Distribute generated PDBs to unit test projects – as it shows out, they are not copied. This probably could be done by a similar kind of build system tweaking as in the previous step, but as I’m not very familiar with the Roslyn codebase, I choose a quicker, dirtier way in PowerShell:
$pdbForDllHash = @{};
foreach ($pdb in Get-ChildItem . -Recurse -Filter *.pdb)
    $correspondingDll = $pdb.FullName.Replace(".pdb", ".dll");
    if (-not (Test-Path $correspondingDll)) { continue; }

    $dllHash = (Get-FileHash $correspondingDll).Hash;
    if (-not $pdbForDllHash.ContainsKey($dllHash))
        $pdbForDllHash.Add($dllHash, $pdb.FullName);

foreach ($dll in Get-ChildItem . -Recurse -Filter *.dll)
    $correspondingPdb = $dll.FullName.Replace(".dll", ".pdb");
    if (Test-Path $correspondingPdb) { continue; }

    $dllHash = (Get-FileHash $dll.FullName).Hash;
    if ($pdbForDllHash.ContainsKey($dllHash))
        "$($pdbForDllHash[$dllHash]) -> $correspondingPdb";
        cp $pdbForDllHash[$dllHash] $correspondingPdb;

Now, after running the already mentioned OpenCover command, we get a monstrous XML output (almost 6 GB!) full of interesting data. We can’t load it into memory in its entirety.

Let’s resort to slightly dated .NET XML streaming APIs here although I’m sure there is an elegant Haskell library somewhere that would allow me to work with huge XMLs in a DOM-like fashion. At least, we can use a nice .NET Core csx runner:

#! "netcoreapp2.0"
#r "nuget:NetStandard.Library,2.0.0"

using System.Xml;

/// <summary>
/// Sequence point ID.
/// </summary>
struct SeqId
    public int Value { get; }

    public SeqId(int value) => Value = value;

    public override int GetHashCode() => Value;

    public override bool Equals(object obj) => obj != null && obj is SeqId && ((SeqId)obj).Value == this.Value;

struct TestId
    public int Value { get; }

    public TestId(int value) => Value = value;

    public override int GetHashCode() => Value;

    public override bool Equals(object obj) => obj != null && obj is TestId && ((TestId)obj).Value == this.Value;

class ModuleCoverage
    public string ModuleName { get; set; }

    public int? NumSequencePoints { get; set; }

    public int? VisitedSequencePoints { get; set; }

    public Dictionary<TestId, HashSet<SeqId>> Visits { get; set; }
        = new Dictionary<TestId, HashSet<SeqId>>();

// Extracting relevant data from the OpenCover output

var modules = new List<ModuleCoverage>();

using (var xml = XmlReader.Create("C:/Source.git/roslyn/opencoveroutput3.xml"))
    SeqId? currentSeqId = null;

    while (xml.Read())
        if (xml.NodeType != XmlNodeType.Element)

        if (xml.Name == "Module")
            if (xml.GetAttribute("skippedDueTo") == null)
                modules.Add(new ModuleCoverage());
                xml.Skip(); // if skipped, then skip

        if (xml.Name == "ModuleName")
            modules.Last().ModuleName = xml.ReadElementContentAsString();

        if (   xml.Name == "Summary"
            && modules.Any()
            && !modules.Last().NumSequencePoints.HasValue)
            modules.Last().NumSequencePoints = xml.ReadContentAsInt();

            modules.Last().VisitedSequencePoints = xml.ReadContentAsInt();


        if (xml.Name == "SequencePoint")
            currentSeqId = new SeqId(xml.ReadContentAsInt());


        if (   xml.Name == "TrackedMethodRef"
            && currentSeqId.HasValue)
            var currentTestId = new TestId(xml.ReadContentAsInt());

            var currentModule = modules.Last();
            if (currentModule.Visits.ContainsKey(currentTestId))
                currentModule.Visits.Add(currentTestId, new HashSet<SeqId>{ currentSeqId.Value });


// Merging modules observed from multiple test assemblies

modules = modules.GroupBy(m => m.ModuleName).Select(mg => {
    var visitsGrouping = mg.SelectMany(m => m.Visits.AsEnumerable()).GroupBy(kv => kv.Key);
    var visits = visitsGrouping.ToDictionary(
        kvg => kvg.Key,
        kvg => new HashSet<SeqId>(kvg.Select(kv => kv.Value.AsEnumerable()).SelectMany(kv => kv)));

    return new ModuleCoverage(){
        ModuleName = mg.Key,
        NumSequencePoints = mg.First().NumSequencePoints,
        VisitedSequencePoints = 0,
        Visits = visits

// Computing partial coverages for tests limited by increasingly growing maximum size

foreach (var module in modules)
    if (module.ModuleName.EndsWith(".UnitTests"))

    var totalSeqPoints = module.Visits.Values.SelectMany(v => v).Distinct().Count();

    if (totalSeqPoints < 5000)

    Console.WriteLine(Environment.NewLine + module.ModuleName + Environment.NewLine);

    var thresholdExponentMaximum = 10M;
    var thresholdMaximum = Math.Pow(2.0, (double)thresholdExponentMaximum);
    for (var thresholdExponent = 0.1M; thresholdExponent <= thresholdExponentMaximum; thresholdExponent += 0.1M)
        var threshold = Math.Pow(2.0, (double)thresholdExponent);

        // What is maximum number of points for test for this threshold?
        var cutPoint = (int)(threshold / thresholdMaximum * module.NumSequencePoints);

        // What points are covered by tests with the coverage under the current cutPoint?
        var visited = new HashSet<SeqId>();

        foreach (var v in module.Visits.Values.Where(v => v.Count <= cutPoint))

        Console.WriteLine($"{cutPoint}, {visited.Count * 10000L / totalSeqPoints}");

The script will output for us code coverage under more and more stringent restriction on how many lines can test check to count into the coverage. It will discover answers to questions like “what’s the coverage if we count only tests that do not cover more than 50 sequence points?” “ok, what about 60?” “70?” …

Its final output is relative at y-axis: we do not care about absolute code coverage. We care about how it grows with taking more and more integration tests into account. X-axis is absolute, because we don’t want size of the codebase to change our meaning of “unit test”. It’s also at logarithmic scale.

We can now visualize output using a script in R too banal to inline:

Unit-ness analysis of Roslyn codebase

We can see that Microsoft.CodeAnalysis.VisualBasic has slightly more unit-y tests than Microsoft.CodeAnalysis.CSharp. Microsoft.CodeAnalysis is somewhere in the middle.


Developers often don’t like code coverage reports because they see how the reduction of complicated situation to a single number distorts reality. One way how to game the scalar value usually produced is to write few tests that touch many lines, something that would leave mark in the output of the method presented.

Nevertheless, “solving” this problem was not the intention here: It’s inherently problematic to try to decide what product should live or what programmer should be promoted based on coverage analysis. Nothing can protect any method’s ability to indicate the truth about the state of the codebase against a programmer motivated to churn out tests that are unit-y (at least based on our imperfect definition), but stupid and meaningless anyway.

So who and why should do this? Programmers themselves should strive to understand their codebases better. In the same way stronger CI checks shorten feedback loop about correctness of the system, stronger analysis tools like this shorten feedback loop about technical health of the solution.


You can find the repository with the code at lukas-lansky/unitness-analysis. The code isn’t very reusable, I’ll see if I find some time to try to sell this idea to projects like ReportGenerator or SonarQube.

» Subscribe to RSS
» Write me an e-mail