Could .NET Source Generator Attacks Be A Danger To Your Code?

In this post, I highlight the potential dangers of trusting third party .NET Source Generators and show ways to try and spot Supply Chain Attacks trying to inject malicious code into your code base.


This post is part of Dustin Moris Gorski’s .NET Advent Calendar. Go check out the other posts at https://dotnet.christmas/ 


Background

Since they were introduced in .NET 5, I have been a big fan of C# source  code generators. They are a powerful tool that (a) help avoid the need to write a lot of boiler plate code and (b) also help improve performance of your code by allowing for code to be generated at compile time for tasks that you would previously may have resorted to using reflection to perform.

If you are not familiar with source code generators, there are lots of places you can find out about them. Here are a few presentations on You Tube that you may find useful

Introduction to Source Generators 

Using Source Generators for Fun (and Maybe Profit)

Exploring Source Generators

These videos provide a lot of information as to why source generators are a great addition to .NET.  However …

With Great Power Comes Great Responsibility

I recently became aware of the potential dangers of using source code generators after reading two excellent blog posts.

The first is Mateusz Krzeszowiec’s VeraCode Blog which provides an overview of supply chain attacks and how source code generators can be used to generate potentially harmful code that gets baked into your software

The second is  Maarten Balliauw ‘s Blog Post that also describes the problem, but also shows how attributes can be used to try and hide the code from being inspected.

I highly recommend going and reading both of these before continuing, but if you want the TL;DR, here is my take on what these two blogs eloquently describe in great detail.

Supply Chain Attacks

As mentioned in these blogs, a supply chain attack uses what appears to be a harmless component or build process to generate code that is added as part of the Continuous Integration build pipeline. This injected code can later be used to scrape data from users of that software. The most notable recent attack was the SolarWinds exploit.

With source generators, you are potentially open to attack on two fronts.

The first is that the generator has the ability to inspect your code via the syntax tree and/or the semantic tree. Whilst unlikely, there is the risk that the analysis stage could keep a lookout for common coding patterns where usernames, passwords, connection details etc. could be scraped and logged.

However, the second is the more dangerous and that is to generate malicious code that ‘dials home’ information from users of your software. Maarten’s blog shows how this is done and what to look for in the generated code.

Defending Yourself Against Attacks

The NuGet team has a page that describes what to look for when assessing whether to use a NuGet package in general. However, given the risk of a rogue source generator creating malicious code, the packages need closer scrutiny.

In summary, here are a few things to consider before using a source generator from NuGet

  • Is the package from a reputable source? If the package is from a commercial vendor, this is fairly easy to verify. However, for open source projects, this is a bit harder, which leads to the next check
  • Is the source code available? If the package is open source, the source code will usually be available on GitHub. For source generators, it is always worth looking over the source code and checking what the generator is doing.
  • If the source code is not available, it may be worth considering using a tool such as JetBrains DotPeek to decompile the code. Two things to consider with this, though. The first is that this may be a breach of the license terms, so you need to tread carefully. The second is that the code may have been obfuscated, so will be harder to read and work out what it is doing.
  • For open source projects, can you be sure that the NuGet package published has been built from the code in the GitHub repository. Subject to licensing, you may want to consider forking the repo and creating your own build and manage within your own NuGet feed
  • Consider using tools to vet packages or static source code analysers to check on generated code.

Ensuring Generated Code Is Visible

Whilst ideally, you will have done the due diligence described above, one of the safeguards you can put in place is to make some changes to your csproj file to ensure that the generated code is visible, so that it can be viewed and tracked in source control.

Writing Compiler Generated Code into Files

By default, generated source code does not get written to files that you can see as the code is part of the Roslyn compile process. This makes it easy for bad actors providing code generators to hide under the radar.

To address this, there are two properties you can add to your project file that will make the generated code visible and trackable in source control

Project File With Code Generation Output using EmitCompilerGeneratedFiles and CompileGeneratedFileOutputPath

The first is the EmitCompilerGeneratedFiles property. This will write out any generated code files to the file system. It should be noted, that the contents of these files are effectively just a log of what has been added to the compilation process. The files themselves do not get used as part of the compilation.

By default, these files with be written to the obj folder in a path structure of

obj\BuildProfile\Platform\generated\generator assembly\generator namespace+name

Within this path there will be one or many files depending on how the source code generator has split the code into virtual files using the ‘hintname’ parameter to the AddSource method on the GeneratorExecutionContext at the end of the generator’s Execute method.

Whilst this gives some visibility, having the files in the obj folder is not really of help as this is usually excluded from source control.

This is where the second project property comes into play.

The CompilerGeneratedFilesOutputPath property allows you to specify a path. This will usually be relative to the consuming project’s path. I typically use a path of ‘GeneratedFiles’.

Beneath this path, the structure is \generator assembly\generator namespace+name, again with one or more files depending on how the generated divides the generated code.

As with the files within the obj folder, these files do not take part in the actual compilation. However, this is where things get a bit tricky as by default, any *.cs files within the project folder structure get included. This causes the compiler to generate a whole load of errors stating that there is duplicate definitions of code in the files that have been output.

To get around this, an extra section needs to be added to the project file.

Image showing the use of the Remove element within the Compile Element in the project fileAdding the this section excludes the generated files from compilation as they are there purely so that we can see the output in source control (as the actual generated code is already in the compilation pipeline in memory).

<ItemGroup>
<Compile Remove=”$(CompilerGeneratedFilesOutputPath)\**” />
</ItemGroup>

Now that we have the files in the project file structure, it can be added to source control and any changes tracked. This then means that the files form part of the code review process and can be checked for anything ‘dodgy’ going on in the code

View Generated Files in Visual Studio

Since VS2019 16.10 and now in VS2022, the Solution Explorer window can now drill down into generated files by expanding each source code generator listed under the Analyzers node in the Solution Explorer window.


You may be interested in my previous blog post where I show how to set up source generator debugging in Visual Studio 2019. Whilst a couple of screens differ in VS2022 (due to the new debugging profile window), the guide works as well


This feature does not require the EmitCompilerGeneratedFiles to be enabled as this uses the in-memory compiler generated code (which *should* be the same as the emitted files.

In the screen shot below, I enabled the ‘Show All Files’ button and expanded out both the compiler generated output in the Analyzers node and also the emitted output files in the GeneratedFiles folder

VS Solution Explorer showing generated files

One thing to also consider is that a sneaky attack may generate different code depending on whether the build is using a Debug or Release profile (as developers usually build locally in debug mode, whilst the CI build engine will be using release)

Demo Code

I have made the solution shown in the above image available in a GitHub repo at https://github.com/stevetalkscode/sourcegeneratorattacks

The solution illustrates the techniques described above using two source code generators.

The first is a simple generator that creates a class that has some of the danger signs to look out for based on Marten’s blog.

The second is a demonstration of the new System.Text.Json code generation that has been introduced with .NET 6 for generating serialisation and deserialisation code that would previously have been handled by reflection (and is still the default behaviour unless explicitly used as shown in the demo).

Conclusion

The above discussion may put you off using source generators. This should not be the case as this feature of .NET is incredibly powerful and is starting to be used by Microsoft in .NET 6 with the improvements made to Json serialisation, razor page compilation and logging.

But if you are using source generators that you (or your team) have not written, you need to be aware of the potential dangers of not verifying where the generator has come from and what it is doing to your code.

“Let’s be careful out there!”


While You Are Here …

If you have enjoyed this blog post, you may be interested in my talk that I presented at DDD East Midlands where I discuss my experiences with source code generation of various shades over the past 30 years.

Understanding Disposables In .NET Dependency Injection – Part 1

In this post I will be discussing the traps that can catch you out by potentially creating memory leaks when registering types that implement the IDisposable interface as services with the out-of-the-box .NET Dependency Injection container.

Background

Typically, a type will implement the IDisposable interface when it holds unmanaged resources that need to be released or to free up memory.

More information about cleaning up resource can be found on Microsoft Docs

To keep things simple for the rest of this post, I will be referring to instances of types that implement IDisposable as “Disposable Objects”.

Managing Disposable Objects without Dependency Injection

Outside of dependency injection, if you create an instance of such a type, it is your responsibly to call the Dispose method on the class to initiate the release of unmanaged resources.

This can be done either

The two approaches are documented on the Microsoft Docs site.

Disposable Objects Created by the Dependency Injection Container

As a general rule, if the Dependency Injection container creates an instance of the disposable object, it will clean up when the instance lifetime (transient, scoped or singleton) expires (E.g. for scoped instances in ASP.NET Core, this will be at the end of the request/response lifetime but for singletons, it is when the container itself is disposed).

The following table (based on the table in the Microsoft Docs page) shows which registration methods will trigger the container to automatically dispose of the object.

Method Automatic Disposal
Add{LIFETIME}<{IMPLEMENTATION}>() Yes
Add{LIFETIME}<{SERVICE}, {IMPLEMENTATION}>() Yes
Add{LIFETIME}<{SERVICE}>(sp => new {IMPLEMENTATION}) Yes
AddSingleton<{SERVICE}>(new {IMPLEMENTATION}) No
AddSingleton(new {IMPLEMENTATION}) No

As you can see from the table above, the three most common methods for adding services, where the container itself is responsible for creating the instance, will automatically dispose of the object at the appropriate time.

However, the last two methods do not dispose of the object. Why? It’s because in these methods, the objects have been directly instantiated with a new keyword and therefore, the container has not been responsible for creating the object.

Whilst they look similar to the third method, the difference is that the instance in that method has been created within the context of a lambda expression which is within the control of the container and therefore in the container’s control.

In the last two methods, the object could be created at the time of registration (by using the new statement) but then again, it may have been created outside these methods (either within the scope of the ConfigureServices method in the StartUp class, or at a class level) and therefore, the container cannot possibly know of where the object has been created, the scope of its reference,  and where else it may be used. Without this understanding, it cannot safely dispose of the object as this may throw an ObjectDisposedException if referenced elsewhere in code after the container has disposed of it.

I will come on to dealing with ensuring these objects referenced in these last two methods can be disposed of correctly in Part 2.

Hiding Disposability from Container Consumers

The first method in the table above is the simplest way to register a type. Consumers will request an instance of the object and make use of it.

However, if the type implements IDisposable, this means that the Dispose method is available to the consumer to call. This has repercussions depending on the lifetime that the dependency has been registered as.

For transients that have been created specifically to be injected into the consuming class, it is not the end of the world. If dispose is called on a transient, the only place that will suffer is the consuming class (and anything it passes the reference to) as any subsequent references to the object (or to be more specific, members in the type that check the disposed status) are likely to result in an ObjectDisposedException (this will depend on the implementation of the injected class).

For scoped and singleton lifetimes, things become more complicated as the object has a lifetime beyond the consumer class. If the consuming class calls Dispose and another consumer then also makes use of a member on the disposed class, that other consumer is likely to receive an ObjectDisposedException.

Therefore, we want to ensure that the Dispose method on the registered class is somehow hidden from the consumer.

There are several ways of hiding the Dispose method which are considered below

Explicit Implementation of IDisposable

The quick (and dirty) way of hiding the Dispose method that exists on a class is to change the Dispose method’s declaration from a public method to an explicit interface declaration (as shown below) so that it can only be called by casting the object to IDisposable.

It should, however, be recognised that this is just obfuscating the availability of the Dispose method. It does not truly hide it as the consumer may be aware that the type implements IDisposable and explicitly cast the object and call Dispose.

This is where extracting out other interfaces comes to our rescue when it comes to dependency injection.

Register the Implementation Type With a More Restrictive Interface

If we define an interface that has all the public members of our class except for the Dispose method and only make the object available by registering it in the DI container with the limited interface as the service, this will make it harder (but not completely impossible) for the consumer of the object to dispose of the object as the concrete type is only known to the container registration (unless the consumer uses GetType() of course, but that is splitting hairs and in many ways negates the whole point of using the container).

Of course, following the Interface Segregation Principle from SOLID, this interface may be broken down into smaller interfaces which the class registered against.

Next Time …

In Part 2 of this series on IDisposable in Dependency Injection, I will move on to dealing with those objects that the container will not dispose of for you.

C# 9 Record Factories

With the release of .NET 5.0 and C# 9 coming up in the next month, I have been updating my Dependency Injection talk to incorporate the latest features of the framework and language.

One of the topics of the talk is using the “Gang of Four” Factory pattern to help with the creation of class instances where some of the data is coming from the DI container and some from the caller.

In this post, I walk through using the Factory Pattern to apply the same principles to creating instances of C# 9 records.

So How Do Records Differ from Classes?

Before getting down into the weeds of using the Factory Pattern, a quick overview of how C#9 record types differ from classes.

There are plenty of articles, blog posts and videos available on the Internet that will tell you this, so I am not going to go into much depth here. However, here are a couple of resources I found useful in understanding records vs classes.

Firstly, I really like the video from Microsoft’s Channel 9 ‘On.NET’ show where Jared Parsons explains the benefits of records over classes.

Secondly, a shout out to Anthony Giretti for his blog post https://anthonygiretti.com/2020/06/17/introducing-c-9-records/ that goes into detail about the syntax of using record types.

From watching/reading these (and other bits and bobs around the Internet), my take on record types vs. classes are as follows:

  • Record types are ideal for read-only ‘value types’ such as Data Transfer Objects (DTOs) as immutable by default (especially if using positional declaration syntax)
  • Record types automatically provide (struct-like) value equality implementation so no need to write your own boiler plate for overriding default (object reference) equality behaviour
  • Record types automatically provide a deconstruction implementation so you don’t need to write your own boiler plate for that.
  • There is no need to explicitly write constructors if positional parameters syntax is used as object initialisers can be used instead

The thing that really impressed me when watching the video is the way that the record syntax is able to replace a 40 line class definition with all the aforementioned boiler plate code with a single  declaration

Why Would I Need A Factory?

In most cases where you may choose to use a record type over using a class type, you won’t need a factory.

Typically you will have some framework doing the work for you, be it a JSON deserialiser, Entity Framework or some code generation tool to create the boilerplate code for you (such as NSwag for creating client code from Swagger/OpenAPI definitions or possibly the new C#9 Source Generators).

Similarly, if all the properties of your record type can be derived from dependency injection, you can register your record type with the container with an appropriate lifetime (transient or singleton).

However, in some cases you may have a need to create a new record instance as part of a user/service interaction by requiring data from a combination of

  • user/service input;
  • other objects you currently have in context;
  • objects provided by dependency injection from the container.

You could instantiate a new instance in your code, but to use Steve ‘Ardalis’ Smith‘s phrase, “new is glue” and things become complicated if you need to make changes to the record type (such as adding additional mandatory properties).

In these cases, try to keep in line with the Don’t Repeat Yourself (DRY) principle and Single Responsibility Principle (SRP) from SOLID. This is where a factory class comes into its own as it becomes a single place to take all these inputs from multiple sources and use them together to create the new instance, providing a simpler interaction for your code to work with.

Creating the Record

When looking at the properties required for your record type, consider

  • Which parameters can be derived by dependency injection from the container and therefore do not need to be passed in from the caller
  • Which parameters are only known by the caller and cannot be derived from the container
  • Which parameters need to be created by performing some function or computation using these inputs that are not directly consumed by the record.

For example, a Product type may have the following properties

  • A product name – from some user input
  • A SKU value  – generated by some SKU generation function
  • A created date/time value – generated at the time the record is created
  • The name of the person who has created the value – taken from some identity source.

The record is declared as follows

You may have several places in your application where a product can be created, so adhering to the DRY principle, we want to encapsulate the creation process into a single process.

In addition, the only property coming from actual user input is the product name, so we don’t want to drag all these other dependencies around the application.

This is where the factory can be a major help.

Building a Record Factory

I have created an example at https://github.com/stevetalkscode/RecordFactory that you may want to download and follow along with for the rest of the post.

In my example, I am making the assumption that

  • the SKU generation is provided by a custom delegate function registered with the container (as it is a single function and therefore does not necessarily require a class with a method to represent it – if you are unfamiliar with this approach within dependency injection, have a look at my previous blog about using delegates with DI
  • the current date and time are generated by a class instance (registered as a singleton in the container) that may have several methods to choose from as to which is most appropriate (though again, for a single method, this could be represented by a delegate)
  • the user’s identity is provided by an ‘accessor’ type that is a singleton registered with the container that can retrieve user information from the current AsyncLocal context (E.g. in an ASP.NET application, via IHttpContextAccessor’s HttpContext.User).
  • All of the above can be injected from the container, leaving the Create method only needing the single parameter of the product name

(You may notice that there is not an implementation of GetCurrentTimeUtc. This is provided directly in the service registration process in the StartUp class below).

In order to make this all work in a unit test, the date/time generation and user identity generation are provided by mock implementations that are registered with the service collection. The service registrations in the StartUp class will only be applied if these registrations have not already taken place by making user of the TryAddxxx extension methods.

Keeping the Record Simple

As the record itself is a value type, it does not need to know how to obtain the information from these participants as it is effectively just a read-only Data Transfer Object (DTO).

Therefore, the factory will need to provide the following in order to create a record instance:

  • consume the SKU generator function referenced in its constructor and make a call to it when creating a new Product instance to get a new SKU
  • consume the DateTime abstraction in the constructor and call a method to get the current UTC date and time when creating a new Product instance
  • consume the user identity abstraction to get the user’s details via a delegate.

The factory will have one method Create() that will take a single parameter of productName (as all the other inputs are provided from within the factory class)

Hiding the Service Locator Pattern

Over the years, the Service Locator pattern has come to be recognised as an anti-pattern, especially when it requires that the caller has some knowledge of the container.

For example, it would be easy for the ProductFactory class to take a single parameter of IServiceProvider and make use of the GetRequiredService<T> extension method to obtain an instance from the container. If the factory is a public class, this would tie the implementation to the container technology (in this case, the Microsoft .NET Dependency Injection ‘conforming’ container).

In some cases, there may be no way of getting around this due to some problem in resolving an dependency. In particular, with factory classes, you may encounter difficulties where the factory is registered as a singleton but one or more dependencies are scoped or transient.

In these scenarios, there is a danger of the shorter-lived dependencies (transient and scoped) being resolved when the singleton is created and becoming ‘captured dependencies’ that are trapped for the lifetime of the singleton and not using the correctly scoped value when the Create method is called.

You may need to make these dependencies parameters of the Create method (see warning about scoped dependencies below), but in some cases, this then (mis)places a responsibility onto the caller of the method to obtain those dependencies (and thus creating an unnecessary dependency chain through the application).

There are two approaches that can be used in the scenario where a singleton has dependencies on lesser scoped instances.

Approach 1- Making the Service Locator Private

The first approach is to adopt the service locator pattern (by requiring the IServiceProvider as described above), but making the factory a private class within the StartUp class.

This hiding of the factory within the StartUp class also hides the service locator pattern from the outside world as the only way of instantiating the factory is through the container. The outside world will only be aware of the abstracted interface through which it has been registered and can be consumed in the normal manner from the container.

Approach 2 – Redirect to a Service Locator Embedded Within the Service Registration

The second way of getting around this is to add a level of indirection by using a custom delegate.

The extension methods for IServiceCollection allow for a service locator to be embedded within a service registration by using an expression that has access to the IServiceProvider.

To avoid the problem of a ‘captured dependency’ when injecting transient dependencies, we can register a delegate signature that wraps the service locator thus moving the service locator up into the registration process itself (as seen here) and leaving our factory unaware of the container technology.

At this point the factory may be made public (or left as private) as the custom delegate takes care of obtaining the values from the container when the Create method is called and not when the (singleton) factory is created, thus avoiding capturing the dependency.


Special warning about scoped dependencies

Wrapping access to transient registered dependencies with a singleton delegate works as both singleton and transient instances are resolved from the root provider. However, this does not work for dependencies registered as scoped lifetimes which are resolved from a scoped provider which the singleton does not have access to.

If you use the root provider, you will end up with a captured instance of the scoped type that is created when the singleton is created (as the container will not throw an exception unless scope checking is turned on).

Unfortunately, for these dependencies, you will still need to pass these to the Create method in most cases.

If you are using ASP.NET Core, there is a workaround (already partially illustrated above with accessing the User’s identity).

IHttpContextAcessor.HttpContext.RequestServices exposes the scoped container but is accessible from singleton services as IHttpContextAcessor is registered as a singleton service. Therefore, you could write a delegate that uses this to access the scoped dependencies via a delegate. My advice is to approach this with caution as you may find it hard to debug unexpected behaviour.


Go Create a Product

In the example, the factory class is registered as a singleton to avoid the cost of recreating it every time a new Product is needed.

The three dependencies are injected into the constructor, but as already mentioned the transient values are not captured at this point – we are only capturing the function pointers.

It is within the Create method that we call the delegate functions to obtain the real-time transient values that are then used with the caller provided productName parameter to then instantiate a Product instance.

 

 

Conclusion

I’ll leave things there, as the best way to understand the above is to step through the code at https://github.com/stevetalkscode/RecordFactory that accompanies this post.

Whilst a factory is not required for the majority of scenarios where you use a record type, it can be of help when you need to encapsulate the logic for gathering all the dependencies that are used to create an instance without polluting the record declaration itself.

Simplifying Dependency Injection with Functions over Interfaces

In my last post, I showed how using a function delegate can be used to create named or keyed dependency injection resolutions.

When I was writing the demo code, it struck me that the object orientated code I was writing seemed to be bloated for what I was trying to achieve.

Now don’t get me wrong, I am a big believer in the SOLID principles, but when the interface has only a single function, having to create multiple class implementations seems overkill.

This was the ‘aha’ moment you sometimes get as a developer where reading and going to user groups or conferences plants seeds in your mind that make you think about problems differently. In other words, if the only tool you have in your tool-belt is a hammer, then every problem looks like a nail; but if you have access to other tools, then you can choose the right tool for the job.

Over the past few months, I have been looking more at functional programming. This was initially triggered by the excellent talk ‘Functional C#’ given by Simon Painter (a YouTube video from the Dot Net Sheffield user group is here, along with my own talk – plug, plug, here).

In the talk, Simon advocates using functional programming techniques in C#. At the end of the talk, he recommends the Functional Programming in C# book by Enrico Buonanno. It is in Chapter 7 of this book that the seed of this blog was planted.

In short, if your interface has only a single method that is a pure function, register a function delegate instead of an interface.

This has several benefits

  • There is a less code to write
  • The intent of what is being provided is not masked by having an interface getting in the way – it is just the function exposed via a delegate
  • There is no object creation of interface implementations, so less memory allocations and may be faster to initially execute (as there is no object creation involved)
  • Mocking is easier – you are just providing an implementation of a function signature without the cruft of having to hand craft an interface implementation or use a mocking framework to mock the interface for you.

So with this in mind, I revisited the demo from the last post and performed the following refactoring:

  • Replaced the delegate signature to perform the temperature conversion instead of returning an interface implementation that has a method
  • Moved the methods in the class implementations to static functions within the startup class (but could easily be a new static class)
  • Change the DI registration to return the results from the appropriate static function instead of using the DI container to find the correct implementation and forcing the caller to execute the implementation’s method.

As can be seen from the two code listings, Listing 1 (Functional) is a lot cleaner than Listing 2 (Object Orientated).

Listing 1

Listing 2

Having done this, it also got me thinking about how I approach other problems. An example of this is unit-testable timestamps.

Previously, when I needed to use a timestamp to indicate the current date and time, I would create an interface of ICurrentDateTime that would have a property for the current datetime. The implementation for production use would be a wrapper over the DateTime.Now property, but for unit testing purposes would be a mock with a fixed date and time to fulfill a test criteria.

Whilst not a pure function, the same approach used above can be applied to this requirement, by creating a delegate to return the current date and time and then registering the delegate to return the system’s DateTime.Now.

This achieves the same goal of decoupling code from the system implementation via an abstraction, but negates the need to create an unnecessary object and interface to simply bridge to the underlying system property.

If you are interested in looking at getting into functional programming while staying within the comfort zone of C#, I highly recommend Enrico’s book.

The demo of both the OO and Functional approaches can be found in the GitHub project at https://github.com/configureappio/NamedDiDemo.