BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles Building a Source Generator for C#

Building a Source Generator for C#

This item in japanese

Key Takeaways

  • Source generators are a good way to reduce the amount of repetitive code that needs to be written.
  • Plan your source generator by first deciding how consuming projects will use it. 
  • Do not use a source generator if the input is something unreliable such as a database.
  • Learning the Roslyn syntax tree isn’t required, but it will give you more options.
     

In this article we’ll be writing a source generator for C#. Along the way we’ll explain some of the key technologies you’re going to need to learn in order to build your own and some of the pitfalls you might encounter on the way.

When starting a new source generator, the first question is, of course, "What are you trying to accomplish?".

But following closely behind is, "Does this make sense as a source generator?". This is an important question, since source generators run every time the code is recompiled.

So, if your code generator needs data from an external source, such as a database, a source generator is probably not appropriate.

For this walk-through, we’ll be looking at the source code for Tortuga Test Monkey. It is available on Github under under the MIT License.

Project Goal

When writing unit tests, many of the tests needed for coverage are ‘low value’; the odds of them actually detecting an error are maybe one in one hundred or less.

Here’s an example of testing property getters and setters to insure they perform as intended:

[TestMethod]
public void @FirstName_SelfAssign()
{
    var objectUnderTest = CreateObject();
    var originalValue = objectUnderTest.@FirstName;
    objectUnderTest.FirstName = originalValue;
    Assert.AreEqual(originalValue, objectUnderTest.@FirstName, "Assigning a property to itself should not change its value.");
}
[TestMethod]
public void @FirstName_DoubleRead()
{
    var objectUnderTest = CreateObject();
    var readOnce = objectUnderTest.@FirstName;
    var readTwice = objectUnderTest.@FirstName;
    Assert.AreEqual(readOnce, readTwice, "Reading a property twice should return the same value both times.");
}

Nobody wants to write those tests because they are tedious and almost never detect a bug. But ‘almost never’ isn’t the same as ‘never’ and they should be tested.
What if we can use a code generator to create these low value tests? If we reduce them to just an attribute, we could get out test coverage while freeing up the developer to focus on the more difficult tests.

[TestClass]
[MakeTests(typeof(SimpleClass), TestTypes.All)]
public partial class Test_SimpleClass

In this walk through, we’ll explore how to create such a test generator.

Project Structure

At the bare minimum you need two projects, one for the source generator itself and one to test it against. For your source generator, you need to make the following additions to the project file.

First, you need to set the target framework to .NET Standard 2.0. This is required by the C# compiler, so it is not optional. Furthermore, all of your dependencies must likewise be for .NET Standard 2.0 or earlier.

<TargetFramework>netstandard2.0</TargetFramework>

Then add the CodeAnalysis libraries from NuGet.

<ItemGroup>
  <PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="3.9.0" PrivateAssets="all" />
  <PackageReference Include="Microsoft.CodeAnalysis.Analyzers" Version="3.3.2" PrivateAssets="all" />
</ItemGroup>

Then you are going to want to indicate this is an "analyzer". If you skip this step, your source generator won’t work when deployed as a NuGet package.

<ItemGroup>
  <None Include="$(OutputPath)\$(AssemblyName).dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" />
</ItemGroup>

Consumer Project Settings

Next you have to decide if the project consuming the source generator will be using it at runtime or only compile time. If it is only used at compile time and doesn’t have any classes the consuming project needs, then you can add the following.

<PropertyGroup>
  <IncludeBuildOutput>false</IncludeBuildOutput>
</PropertyGroup>

In our case, Tortuga.TestMonkey.dll includes a couple of public classes the consuming project needs, specifically an attribute and matching enum, so we will not be using the above setting.

If you look at the Tortuga.TestMonkey.csproj file on GitHub, you’ll also see a section titled "NuGet Packaging Boilerplate". This has nothing to do with source generators, it just makes publishing the NuGet package a little bit easier.

On the consumer side, you’ll need to reference the source generator. You can do this as a project reference or a package reference.

<ItemGroup Condition="'$(Configuration)'=='Debug'">
  <ProjectReference Include="..\Tortuga.TestMonkey\Tortuga.TestMonkey.csproj" OutputItemType="Analyzer" />
</ItemGroup>
 
<ItemGroup Condition="'$(Configuration)'=='Release'">
  <PackageReference Include="Tortuga.TestMonkey" Version="0.2.0" />
</ItemGroup>

In the above example, we show both options. This was done so we could test the NuGet package in release mode while still having the convenience of using a project reference while writing the source generator.

In order to see the results of your code generator, you will want to turn on EmitCompilerGeneratedFiles.

<PropertyGroup>
  <EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles>
  <CompilerGeneratedFilesOutputPath>Generated</CompilerGeneratedFilesOutputPath>
</PropertyGroup>

<ItemGroup>
  <!-- Don't include the output from a previous source generator execution into future runs; the */** trick here ensures that there's
  at least one subdirectory, which is our key that it's coming from a source generator as opposed to something that is coming from
  some other tool. -->
  <Compile Remove="$(CompilerGeneratedFilesOutputPath)/*/**/*.cs" />
</ItemGroup>

If you do not wish for your generated code to be checked in, add this line to your .gitignore file.

**/Generated/Tortuga.TestMonkey/*

For the purpose of this walkthrough, you’ll need to create a unit test project that will act as the consuming project. You’ll also want a project being tested. For your convenience, these are provided in the source code examples on GitHub.

The Source Generator

The source generator is defined using the ISourceGenerator attribute and the ISourceGenerator interface. Both can be found in the Microsoft.CodeAnalysis namespace.

The Initialize method is used to register for two events, syntax notification and post-initialization. These are not EventHandler style events, so you may only have one of each. For this walkthrough, we’re only going to use RegisterForSyntaxNotifications. We’ll talk about the use case for RegisterForPostInitialization later in the article.

The syntax notification event needs an ISyntaxContextReceiver. This will be called after the compiler has constructed the syntax tree from the source code. If your source generator doesn’t need to analyze the source code, you can skip this step.

The Execute method is used to actually generate the source code. If you used a syntax receiver, this will run afterwards. We’ll return to this later in the article, but for now we only need this line:

// retrieve the populated receiver
if (!(context.SyntaxContextReceiver is SyntaxReceiver receiver))
    return;

The receiver variable is the same object we setup in the Initialize method.

The Syntax Receiver

The syntax receiver must implement the ISyntaxContextReceiver interface. This has a single method called OnVisitSyntaxNode, which is "called for every syntax node in the compilation".

In the examples from Microsoft, the syntax receiver is often a private class inside the source generator class. We chose to not go that route as our syntax receiver is somewhat large and will be easier to deal with in its own file.

In our syntax receiver, there are two properties:

public List<string> Log { get; } = new();
public List<WorkItems> WorkItems { get; } = new();

The work items are merely a list of things we need to generate code for. Basically, it is the ‘to-do’ list for the source generator’s Execute method.

The log needs a bit more explanation. We need it because we cannot attach a debugger to the compiler itself. Since the source generator runs inside the compiler, there isn’t a good way to see what’s actually happening. So, like an old-fashioned programmer using Console.WriteLine, we are just going to dump everything we can into a text file.

To set this up, place the following in your syntax receiver’s OnVisitSyntaxNode method.
try

{
    if (context.Node is ClassDeclarationSyntax classDeclarationSyntax)
    {
        Log.Add($"Found a class named {testClass.Name}");
    }
}
catch (Exception ex)
{
    Log.Add("Error parsing syntax: " + ex.ToString());
}

In order to view this log data, we need to capture it in the source generator’s Execute method.

//Write the log entries
context.AddSource("Logs", SourceText.From($@"/*{ Environment.NewLine + string.Join(Environment.NewLine, receiver.Log) + Environment.NewLine}*/", Encoding.UTF8));

If you enabled EmitCompilerGeneratedFiles in your consuming project, you should see a file named "Generated\Tortuga.TestMonkey\Tortuga.TestMonkey.TestGenerator\Logs.cs" with contents such as:

/*
Found a class named Test_SimpleClass
Found a class named Test_AnotherClass
Found a class named Test_NoDefaultConstructor
Found a class named Test_SimplePair
Found a class named AutoGeneratedProgram
*/

Looking for Classes

The line "context.Node is ClassDeclarationSyntax classDeclarationSyntax" says we’re looking for syntax nodes that represent a class declaration. Any other nodes will be skipped.

If you want to look for something else, you can use the Roslyn syntax visualizer in Visual Studio to determine what kind of syntax node you are looking for. Instructions for installing and using this tool are available on Microsoft Docs.

For this code generator, we don’t actually care much about the syntax nodes. They lack the information we need to accomplish our goals. The next step is to turn that syntax node into a "semantic node". For a class, it would look like this:

var testClass = (INamedTypeSymbol)context.SemanticModel.GetDeclaredSymbol(context.Node)!;

In this example, we’re going to be checking to see if testClass is a unit test class with the MakeTests attribute. If we have other classes in our unit test project, those will also be captured.

To see what we really have, we’ll leverage our log file.

var attributes = testClass.GetAttributes();
Log.Add($"    Found {attributes.Length} attributes");
foreach (AttributeData att in attributes)
{
    Log.Add($"   Attribute: {att.AttributeClass!.Name} Full Name: {att.AttributeClass.FullNamespace()}");
    foreach (var arg in att.ConstructorArguments)
    {
        Log.Add($"    ....Argument: Type='{arg.Type}' Value_Type='{arg.Value?.GetType().FullName}' Value='{arg.Value}'");
    }
}

Here is an excerpt from our updated log file.

Found a class named Test_SimpleClass
    Found 2 attributes
   Attribute: TestClassAttribute Full Name: Microsoft.VisualStudio.TestTools.UnitTesting
   Attribute: MakeTestsAttribute Full Name: Tortuga.TestMonkey
    ....Argument: Type='System.Type' Value_Type='Microsoft.CodeAnalysis.CSharp.Symbols.PublicModel.NonErrorNamedTypeSymbol' Value='Sample.UnderTest.SimpleClass'
    ........Found a INamedTypeSymbol named 'Sample.UnderTest.SimpleClass'
    ....Argument: Type='Tortuga.TestMonkey.TestTypes' Value_Type='System.Int32' Value='-1'

You’ll note we found the attributes on the class as well as any constructor parameter for those attributes. We could have also asked it for att.NamedArguments if we had any other properties on the attribute we needed to read.

One of the attributes refers to the class being tested. We’re going to grab that as well as the enumeration indicating which types of tests are desired.

var makeTestAttribte = testClass.GetAttributes().FirstOrDefault(att => att.AttributeClass.FullName() == "Tortuga.TestMonkey.MakeTestsAttribute");
if (makeTestAttribte != null)
{
    var classUnderTest = (INamedTypeSymbol?)makeTestAttribte.ConstructorArguments[0].Value;
    var desiredTests = (TestTypes)(int)(makeTestAttribte.ConstructorArguments[1].Value ?? 0);
    if (classUnderTest != null && desiredTests != TestTypes.None && testFramework != TestFramework.Unknown)
    {
        WorkItems.Add(new(testClass, classUnderTest, desiredTests));
        Log.Add($"Added work item for {classUnderTest.FullName()}!");
    }
}

Before we move on, some extension methods need to be called out. The method FullName() is defined in the file SemanticHelper.cs. In there you’ll find other such helper functions for working with the semantic tree.

We’ll return to the syntax receiver later for more information, but for now we have enough to start working on the code generator.

Generating the Source Code

For this next step, we return to the source generator’s Execute method. To start with, we setup a loop based on the work items collected by the syntax receiver.

foreach (var workItem in receiver.WorkItems)
{
    var fileName = workItem.TestClass.FullName() + ".cs";
    var code = new CodeWriter();
    //populate code here
    context.AddSource(fileName, SourceText.From(code.ToString(), Encoding.UTF8));
}

The CodeWriter is a simple wrapper around a StringBuilder that handles things like indentation. You could skip it and use a StringBuilder directly, but it provides a convenient place to place repetitive code such as method declarations.

The first thing we need to add is the test framework’s using statements. This is followed by the namespace and class declaration.

code.AppendLine("//This file was generated by Tortuga Test Monkey");
code.AppendLine();
code.AddTestFramework();
code.AppendLine();
using (code.BeginScope($"namespace {workItem.TestClass.FullNamespace()}"))
{
    using (code.BeginScope($"partial class {workItem.TestClass.Name}"))

The BeginScope function is an AppendLine that also increases the indentation level. The FullNamespace is another extension method from SemanticHelper class. The class name is the same as the class that held the MakeTests attribute.

Enumerating the Properties

For each type of test, we’ll create a separate function. Each function starts by checking to see if that type of test was desired, then it will enumerate the properties and emit the appropriate code to our string builder.
 

static void PropertySelfAssign(WorkItems workItem, CodeWriter code)
{
    if (workItem.TestTypes.HasFlag(TestTypes.PropertySelfAssign))
    {
        code.AppendLine();
        code.AppendLine("//Property Self-assignment Tests");

        foreach (var property in workItem.ClassUnderTest.ReadWriteScalarProperties())
        {
            using (code.StartTest($"{property.Name}_SelfAssign"))
            {
                code.AppendLine("var objectUnderTest = CreateObject();");
                code.AppendLine($"var originalValue = objectUnderTest.@{property.Name};");
                code.AppendLine($"objectUnderTest.{property.Name} = originalValue;");
                code.AssertAreEqual("originalValue", $"objectUnderTest.@{property.Name}", "Assigning a property to itself should not change its value.");
            }
        }
    }
}

The extension method ReadWriteScalarProperties is defined as such:

public static IEnumerable<IPropertySymbol> ReadWriteScalarProperties(this INamedTypeSymbol symbol)
{
    return symbol.GetMembers().OfType<IPropertySymbol>().Where(p => (p.GetMethod != null) && (p.SetMethod != null) && !p.Parameters.Any());
}

Note we are not checking to see if the property is public. That information is not available, nor necessary. Any private or internal properties we wouldn’t have access to, are automatically filtered out of the semantic tree by the compiler.

Working with Partial Methods

In the above code there is a function called CreateObject. This is needed because the code generator won’t always be able to create a suitable object to be tested against. Perhaps the one created with the default constructor is inappropriate or maybe it doesn’t even have a default constructor. To solve this problem, the source generator emits a partial method and matching driver function.

partial void CreateObject(ref Sample.UnderTest.SimpleClass? objectUnderTest);
Sample.UnderTest.SimpleClass CreateObject()
{
    Sample.UnderTest.SimpleClass? result = null;
    CreateObject(ref result);
    if (result != null)
        return result;
    return new Sample.UnderTest.SimpleClass();
}

If the caller doesn’t override CreateObject(ref T objectUnderTest), then we use the default constructor.

But what if there isn’t a default constructor? In that case we use the HasDefaultConstructor extension method from SemanticHelper.

public static bool HasDefaultConstructor(this INamedTypeSymbol symbol)
{
    return symbol.Constructors.Any(c => c.Parameters.Count() == 0);
}

As with properties, a non-visible constructor will be filtered out of the semantic tree.

We then emit a slightly different helper function that will throw an exception, thus causing the test to fail.

Sample.UnderTest.NoDefaultConstructor CreateObject()
{
    Sample.UnderTest.NoDefaultConstructor? result = null;
    CreateObject(ref result);
    if (result != null)
        return result;

    throw new System.NotImplementedException("Please implement the method 'partial void CreateObject(ref Sample.UnderTest.NoDefaultConstructor? objectUnderTest)'.");
}

This isn’t the only way it could have been handled. Instead of throwing an exception, the developer could have used a required partial method. A required partial method is one that, if not implemented, will result in a compiler error.

To indicate a partial method is required, you must include an access modifier such as private. For example,

private partial Sample.UnderTest.NoDefaultConstructor CreateObject();

Unlike optional partial methods, a required partial method may return a value. This removes the need for the ref parameter.

Detecting Dependencies

The next issue to be solved is the one of unit test frameworks. Even if we limit Tortuga Test Monkey to only the main three test frameworks, it still requires different code to be generated for MSTest, NUnit, and XUnit.

To solve this, we’ll go back to the SyntaxReceiver and give it the ability to see the list of assemblies referenced by the test project. This list can be obtained from the ContainingModule of a node in the semantic tree.

var testFramework = TestFramework.Unknown;
foreach (var assembly in testClass.ContainingModule.ReferencedAssemblies)
{
    if (assembly.Name == "Microsoft.VisualStudio.TestPlatform.TestFramework")
        testFramework = TestFramework.MSTest;
    else if (assembly.Name == "nunit.framework")
        testFramework = TestFramework.NUnit;
    else if (assembly.Name == "xunit.core")
        testFramework = TestFramework.XUnit;
}

From there, a simple switch block in the CodeWriter handles the rest.

public void AddTestFramework()
{
    switch (TestFramework)
    {
        case TestFramework.MSTest:
            AppendLine("using Microsoft.VisualStudio.TestTools.UnitTesting;");
            break;
        case TestFramework.XUnit:
            AppendLine("using Xunit;");
            break;
        case TestFramework.NUnit:
            AppendLine("using NUnit.Framework;");
            break;
    }
}

How to decouple the source generator DLL

In Tortuga.TestMonkey, we decided to allow the unit test projects to directly reference Tortuga.TestMonkey. This is acceptable to us because a unit test project is not deployed, so the extra dependency isn’t much of a concern.

For your source generator, this may not be the case. Fortunately, there is another option. Instead of having the consuming project reference the source generator’s classes, you can inject them. This is done via a RegisterForPostInitialization event in the Initialize method. For example,

context.RegisterForPostInitialization(context =>
{
    context.AddSource("TestTypes",
        @"using System;
        namespace Tortuga.TestMonkey
        {
            [Flags]
            internal enum TestTypes
            {
                /// <summary>
                /// Do not generate any tests for this class.
                /// </summary>
                None = 0,

                /// <summary>
                /// Read a property and assign it to itself, verifying that it hasn't changed.
                /// </summary>
                PropertySelfAssign = 1,

                /// <summary>
                /// Read the same property twice, expecting the same result both times.
                /// </summary>
                PropertyDoubleRead = 2,

                All = -1
            }
        }");
});

If you use this method, don’t forget to set IncludeBuildOutput to false in the project settings.

There are some limitations to this approach you need to be aware of. First, you need to make the helper classes internal. Otherwise, you could end up with naming collisions if multiple libraries use the same source generator.

As you can see, the code is just a large string, which means you will not get any compiler support when writing the class. I suggest creating the code in a separate scratch project to make sure it compiles correctly before pasting it in as a string.

Another problem is more of an IDE issue. Visual Studio will not always see these helper classes. When that happens, the IDE will report compiler errors in the editor even though you can successfully build the project.

Due to these ‘developer experience’ issues, I lean towards just referencing the source generator DLL directly when possible.

About the Author

Jonathan Allen got his start working on MIS projects for a health clinic in the late 90's, bringing them up from Access and Excel to an enterprise solution by degrees. After spending five years writing automated trading systems for the financial sector, he became a consultant on a variety of projects including the UI for a robotic warehouse, the middle tier for cancer research software, and the big data needs of a major real estate insurance company. In his free time he enjoys studying and writing about martial arts from the 16th century. 

 

BT