Could .NET Source Generator Attacks Be A Danger To Your Code?

Published : 2 Dec 2021 Estimated reading time : 9 minutes.

Tags C# 9 Code Generation .NET Core Visual Studio

Banner with message 'Do you trust externally written C Sharp Source Generators?'

In this post, I highlight the potential dangers of trusting third party .NET Source Generators and show ways to try and spot Supply Chain Attacks trying to inject malicious code into your code base.

This post is part of Dustin Moris Gorski's .NET Advent Calendar. Go check out the other posts at https://dotnet.christmas/

Background

Since they were introduced in .NET 5, I have been a big fan of C# source code generators. They are a powerful tool that (a) help avoid the need to write a lot of boiler plate code and (b) also help improve performance of your code by allowing for code to be generated at compile time for tasks that you would previously may have resorted to using reflection to perform.

If you are not familiar with source code generators, there are lots of places you can find out about them. Here are a few presentations on You Tube that you may find useful

Introduction to Source Generators

See https://youtu.be/cB66gOHConw

Using Source Generators for Fun (and Maybe Profit)

See https://youtu.be/4DVV7FXukC8

Exploring Source Generators

See https://youtu.be/Rgdj8X6i9sA

These videos provide a lot of information as to why source generators are a great addition to .NET. However ...

With Great Power Comes Great Responsibility

I recently became aware of the potential dangers of using source code generators after reading two excellent blog posts.

The first is Mateusz Krzeszowiec's VeraCode Blog which provides an overview of supply chain attacks and how source code generators can be used to generate potentially harmful code that gets baked into your software

The second is Maarten Balliauw 's Blog Post that also describes the problem, but also shows how attributes can be used to try and hide the code from being inspected.

I highly recommend going and reading both of these before continuing, but if you want the TL;DR, here is my take on what these two blogs eloquently describe in great detail.

Supply Chain Attacks

As mentioned in these blogs, a supply chain attack uses what appears to be a harmless component or build process to generate code that is added as part of the Continuous Integration build pipeline. This injected code can later be used to scrape data from users of that software. The most notable recent attack was the SolarWinds exploit.

With source generators, you are potentially open to attack on two fronts.

The first is that the generator has the ability to inspect your code via the syntax tree and/or the semantic tree. Whilst unlikely, there is the risk that the analysis stage could keep a lookout for common coding patterns where usernames, passwords, connection details etc. could be scraped and logged.

However, the second is the more dangerous and that is to generate malicious code that 'dials home' information from users of your software. Maarten's blog shows how this is done and what to look for in the generated code.

Defending Yourself Against Attacks

The NuGet team has a page that describes what to look for when assessing whether to use a NuGet package in general. However, given the risk of a rogue source generator creating malicious code, the packages need closer scrutiny.

In summary, here are a few things to consider before using a source generator from NuGet

Is the package from a reputable source? If the package is from a commercial vendor, this is fairly easy to verify. However, for open source projects, this is a bit harder, which leads to the next check
Is the source code available? If the package is open source, the source code will usually be available on GitHub. For source generators, it is always worth looking over the source code and checking what the generator is doing.
If the source code is not available, it may be worth considering using a tool such as JetBrains DotPeek to decompile the code. Two things to consider with this, though. The first is that this may be a breach of the license terms, so you need to tread carefully. The second is that the code may have been obfuscated, so will be harder to read and work out what it is doing.
For open source projects, can you be sure that the NuGet package published has been built from the code in the GitHub repository. Subject to licensing, you may want to consider forking the repo and creating your own build and manage within your own NuGet feed
Consider using tools to vet packages or static source code analysers to check on generated code.

Ensuring Generated Code Is Visible

Whilst ideally, you will have done the due diligence described above, one of the safeguards you can put in place is to make some changes to your csproj file to ensure that the generated code is visible, so that it can be viewed and tracked in source control.

Writing Compiler Generated Code into Files

By default, generated source code does not get written to files that you can see as the code is part of the Roslyn compile process. This makes it easy for bad actors providing code generators to hide under the radar.

To address this, there are two properties you can add to your project file that will make the generated code visible and trackable in source control

Image showing the project file before editing

The first is the EmitCompilerGeneratedFiles property. This will write out any generated code files to the file system. It should be noted, that the contents of these files are effectively just a log of what has been added to the compilation process. The files themselves do not get used as part of the compilation.

By default, these files with be written to the obj folder in a path structure of

obj\BuildProfile\Platform\generated\generator assembly\generator namespace+name

Within this path there will be one or many files depending on how the source code generator has split the code into virtual files using the 'hintname' parameter to the AddSource method on the GeneratorExecutionContext at the end of the generator's Execute method.

Whilst this gives some visibility, having the files in the obj folder is not really of help as this is usually excluded from source control.

This is where the second project property comes into play.

The CompilerGeneratedFilesOutputPath property allows you to specify a path. This will usually be relative to the consuming project's path. I typically use a path of 'GeneratedFiles'.

Beneath this path, the structure is \generator assembly\generator namespace+name, again with one or more files depending on how the generated divides the generated code.

As with the files within the obj folder, these files do not take part in the actual compilation. However, this is where things get a bit tricky as by default, any *.cs files within the project folder structure get included. This causes the compiler to generate a whole load of errors stating that there is duplicate definitions of code in the files that have been output.

To get around this, an extra section needs to be added to the project file.

Image showing the use of the Remove element within the Compile Element in the project file

Adding the this section excludes the generated files from compilation as they are there purely so that we can see the output in source control (as the actual generated code is already in the compilation pipeline in memory).

<ItemGroup> 
   <Compile Remove="$(CompilerGeneratedFilesOutputPath)\**" /> 
</ItemGroup>

Now that we have the files in the project file structure, it can be added to source control and any changes tracked. This then means that the files form part of the code review process and can be checked for anything 'dodgy' going on in the code

View Generated Files in Visual Studio

Since VS2019 16.10 and now in VS2022, the Solution Explorer window can now drill down into generated files by expanding each source code generator listed under the Analyzers node in the Solution Explorer window.

You may be interested in my previous blog post where I show how to set up source generator debugging in Visual Studio 2019. Whilst a couple of screens differ in VS2022 (due to the new debugging profile window), the guide works as well

This feature does not require the EmitCompilerGeneratedFiles to be enabled as this uses the in-memory compiler generated code (which *should* be the same as the emitted files.

In the screen shot below, I enabled the 'Show All Files' button and expanded out both the compiler generated output in the Analyzers node and also the emitted output files in the GeneratedFiles folder

VS Solution Explorer showing generated files

One thing to also consider is that a sneaky attack may generate different code depending on whether the build is using a Debug or Release profile (as developers usually build locally in debug mode, whilst the CI build engine will be using release)

Demo Code

I have made the solution shown in the above image available in a GitHub repo at https://github.com/stevetalkscode/sourcegeneratorattacks

The solution illustrates the techniques described above using two source code generators.

The first is a simple generator that creates a class that has some of the danger signs to look out for based on Marten's blog.

The second is a demonstration of the new System.Text.Json code generation that has been introduced with .NET 6 for generating serialisation and deserialisation code that would previously have been handled by reflection (and is still the default behaviour unless explicitly used as shown in the demo).

Conclusion

The above discussion may put you off using source generators. This should not be the case as this feature of .NET is incredibly powerful and is starting to be used by Microsoft in .NET 6 with the improvements made to Json serialisation, razor page compilation and logging.

But if you are using source generators that you (or your team) have not written, you need to be aware of the potential dangers of not verifying where the generator has come from and what it is doing to your code.

"Let's be careful out there!"

While You Are Here ...

If you have enjoyed this blog post, you may be interested in my talk that I presented at DDD East Midlands where I discuss my experiences with source code generation of various shades over the past 30 years.

See https://youtu.be/3IuDAp9xV3w