How to squash EF Core migrations

How to squash EF Core migrations

Each migration in EF core contains a full copy of the snapshot at the time. In a large and active team this soon adds up to millions and millions of lines of code. This is an issue both because it makes compiling slower but also make things like searching for free text in the solution return a bunch of noise. As you can see in the screenshot it's by far the most LoC we have even after cleaning up so the potential gain is huge.

Here is a walkthrough on how we cleaned up about a million lines of code from old migrations. This is not a drag & drop solution that you can copy but hopefully it helps you to adapt it to your solution.

Overview of our approach

  1. Create a new fresh database from the old migrations (We will use this for comparison later)
  2. Find a suitable target migration to be the new initial (We picked one about 3 months old)
  3. Write a script to merge the Up() methods of all earlier migrations into the Up() method of a new migration. We ignored Down() because we don't use it for old migrations.
  4. Generate this migration and add it to the project. In our case we called it 20200730130157_SquashedMigrations1.cs. We used the snapshot from the target migration we had picked.
  5. Generate a 2nd prep migration that inserts into the migrations history that 20200730130157_SquashedMigrations1.cs has already run. We called this 20200730130156_SquashedMigrations1_prep.cs. Note the slightly smaller timestamp on that one to make sure it runs before the real migrations.
  6. Delete the old migrations
  7. Point our config to a new database and run the migrations.
  8. Compare that the schema we generate is equal using the Sql Schema Compare in Visual Studio.
  9. Work through the issues until we have equal schemas. This part is a bit complicated but I will get back to it.
  10. Merge and 🤞 (Ok, we did run more tests both on local and staging databases) 

How to find a suitable migration to squash to?

Our reasoning was this:

  • It has to be far back enough that no non-merged branches might contain older migrations + some margin. While we actually have older branches than 3 months those are basically postponed projects and we were OK with having to remake those migrations at least so we decided on 3 months.
  • It has to be far back enough that all production/staging systems already have updated.
  • It's nice to keep at least some history around if something regarding the database needs to be debugged. Not that we do this often, it's probably mostly a feeling of control. 
  • Ideally pick a point where not too many migrations happen at the same time. 

The first point was the "furthest back" for us so we picked 3 months.

How to generate the new migrations?

This was the part that took the longest. We built a script to make this easier the next time we need to do this. The script is a bit long to post here but the concept is fairly straightforward so you can probably run the script without too much modifications. You will have to modify the two template files though because they contain namespaces that need to match your project. You also need to install CommandLineParser If you want to run the script as is. 

You can find the current version of that script at https://github.com/bokio/EFCoreTools/blob/main/MigrationSquasher/Program.cs and the folder with all the files https://github.com/bokio/EFCoreTools/tree/main/MigrationSquasher.

The script above actually does steps 3 to 6 so after running it the old migrations are gone. Making a commit before running the script is a good idea.

How to use Sql Schema Compare in Visual Studio?

You find it under "Tools -> Sql Server -> Sql Schema Comparison". There is a bunch of settings in this tool that you might want to tweak but in general the default settings should get you far.



After you run Compare here it will tell you if you have any schema differences. As long as you do we need to work through that.

How to fix the schema differences?

I can't give you a perfect answer for this because there can be reasons we haven't found. But what we did find was that most likely it's because the migrations you chose has a "corrupt" migration snapshot. Or rather, it's missing some information

This can happen when your team creates migrations in parallel on different branches. A basic example of a flow that cause this:

Version Parent is on the dev branch.
Version B is branch from Parent and adds a migration.
Version C is branch from Parent and adds a migration.
Version D is B & C merged back into dev.

If we look at the global snapshot: 
B contains Parent + B
C contains Parent + C
D contains Parent + B + C

This is great. The problem is that no migration snapshot contains Parent + B + C. 

So if you pick B or C as you snapshot you will lose some information. Most of the time this is not a huge issue but EF uses these migrations snapshots to decide specificity of columns and if that needs to update. So I suddenly saw issues where one column had varchar(MAX) in the new version of the database but varchar(50) in the old one.


How to create a merged snapshot then?    

In my case the issue was quite straightforward as I had picked a migration where there was only one other parallel migration so I manually stitched these together. I think a more generic solution to this would be to go back in source control to when all the parallel migrations was merged and instead just copy the global snapshot into your initial migration. While I haven't tested this you don't need to worry about trying it out because if something goes wrong you will see that when you compare the schemas again.  

Issues we ran into or we know that could affect people

Everything worked perfectly in production. We run our migrations as an idempotent script in our pipelines and that worked great. What we did see was some small issues in development though where the migration would try to run the new initial migration for developers and fail because tables already existed. Running it again worked. We tracked down that issue to be because when you run the package manager to update the database it actually downloads the migration history in a batch, then decide which migrations to run and run them without rechecking the migrations history. So when using that method to run migrations running the prep migration didn't help at all. If you run migrations as part of the application start this is likely to affect you!