Zero downtime migrations with Entity Framework Core and Azure App Service

Zero downtime migrations with Entity Framework Core and Azure App Service

Can Entity Framework Core migrations work for a "real" production system or is it only suitable for demo projects? The answer: It absolutely can! It doesn't mean there won't be any challenges but most of these are things you will run into with any technology. In our case we have had almost no measured downtime the last few years even though we release whenever we feel like it. 

In this post we will try to get the birds eye view on what it takes to do continuous deployments with EF Core migrations with 0 downtime and the ability to rollback releases. While we will use Azure pipelines for our example it should work with any continuous delivery system. This post assume you know the basics of how EF core migrations works.

The summary of what you will need to do

  • "All" migrations must be backwards compatible
  • Idempotent migration scripts should be generated in the CI/CD pipelines
  • Updating the database should happen during release rather than during runtime
  • The application must NOT do database compatibility checks at runtime 

The process will be something like this:

  1. A developer make a model change and generate a new migration + commit this
  2. In the CD pipeline we generate the sql script to idempotently apply all migrations to a target database
  3. The release pipeline deploy the new code to our warmup slot
  4. The release pipeline execute the script from 2) on the database to update it with the new migration
  5. The release pipeline swaps the slots after warming up the warmup slot 

Let's look at how we can achieve these parts.

Step 1: Setting up your project for CD migrations

This step is fun. It's just about removing code. 

If you have this code in your project you should remove it.

using var db = DB.NewUnsafe();

PS: You should also make sure you don't have db.Database.EnsureCreated(). You never want to use that together with migrations in your application code.

The reasons you don't want to call Migrate() in your code like this are

  1. If the migration fails your application doesn't start. That is not good at all even if a warmup slot certainly make this far less problematic.
  2. You might run into race conditions if you have more than one server.   
  3. Having quick startup times helps on scale-out. So doing more work there is preferable to avoid. 
  4. Ideally you probably don't want to allow the database user you use in production to have DDL rights. EF is quite good at preventing SQL-injection so the risk is less, but if you do any manual sql it's nice to not risk a hacker dropping your whole database. 

Step 2: Adding migrations that are backwards compatible

Before we can deploy any migrations we have to add them. We are using EF Core from .net 4.7.2 so for us this is done through Nuget Package Manager and we run.

Add-Migration SomeNewMigration

Read more in the official docs on how that works. The important thing is that we will now have a new migration. Because we removed db.Database.Migrate() you have to apply this migration yourself on your dev machine through Update-Database or dotnet ef database update

Important note: If we want 0-downtime deploys we must always be able to run the old version of the code on the newer version of the database. That means there is a bunch of migration operations you have to avoid.

  • Drop table
  • Rename table
  • Drop column
  • Rename column
  • Change column type to something incompatible with the old type
  • Adding a non-nullable column without a default value to an existing table
  • Adding a constraint that is not compatible with the old code version
  • More operations ... (We have not figured all of them out)

The reasons for this is important to understand so I will mention it here even if I plan to dive much deeper into this topic in a later post.

Imagine we are at V1 of our code and have this entity.

public class Dummy
   public int Id { get; set; }
   public string Name { get; set; }

Querying that table by doing db.Dummies.ToList() gives the following SQL:

SELECT [d].[Id], [d].[Name]
FROM [Dummies] AS [d]

Now lets pretend we decide to drop the name column in V2. After that code change and migration the following would change to:

public class Dummy
   public int Id { get; set; }

And we also removed the Name column from the database + our new queries look like this

SELECT [d].[Id]
FROM [Dummies] AS [d]

This works perfectly fine when you are developing. The issue is when you try to deploy this without causing downtime.

Remember our steps from above

4) The release pipeline execute the script from 2) on the database to update it with the new migration
5) The release pipeline swaps the slots after warming up the warmup slot 

The timing between 4) is completed and 5) is completed is most likely minutes. During this time any time we query our Dummies in the production slot (V1 code) we will use the first query that includes the name column. But we just dropped that column! Sql Server doesn't become very happy with you if you do this and will throw an error and all our endpoints that touch dummies will now crash for all users for a few minutes. Beside those few minutes we also made sure we can revert to V1 even if we have an emergency bug without breaking things even after we did the swap. 

Hmm, so what about if we reverse the order and deploy the new code and then make the DB migration? In this case that actually would work quite fine, except you still can't do rollbacks. BUT that will in many cases break for insert and updates instead!

Inserting our dummies in V1 generates an sql statement something like this.

INSERT INTO [Dummies] ([Name])
VALUES (@p0)  

Now imagine if we added a new column to the code so we try to insert a column that does not yet exist in the database. Sql Server will once again be quite unhappy with you and throw exceptions. So that approach just changes what things we break.  

Step 3: Generate the idempotent sql script as part of the CD build

So when we now have our migrations and we know we can't apply them during runtime and we certainly don't want to do it manually it more or less force us to do it in our release pipeline.

Doing this with EF core when you are running .net core application is very easy. Just run dotnet ef migrations script --idempotent --output <filename>.sql as part of the CD pipeline. In our case though we are still on .net 4.7.2 and cant use the dotnet tool + we do some small modifications to the generated script so we are running a small application that generates the script for us instead.

Basically that is just a small console application with the following lines. 

var migrator = db.GetService<IMigrator>();
var sql = migrator.GenerateScript(idempotent: true);
File.WriteAllText(fileName, sql);

And in our YAML pipeline we run this console application with this step (after building it):

  - script'Bokio.Migrator.exe -g $(build.artifactstagingdirectory)\migration.sql'
    displayName'Generate Bokio migration script'

That migration script will now be part of our artifact which is nice because we can both use it in the release pipeline but also inspect it later if we need to trace down some issue.


Step 4: Setting up a deploy to Azure App Service that migrates the database

The last step we need to do is to setup our release pipeline to use this. 

But before that remember that we an App Service with a warmup slot and our pipeline need to do the following steps. 

1. Deploy code to the warmup slot
2. Execute the db migrations
3. Swap the slots

The content of the database migration script is not more complicated than this. 

Example of setting for the Azure SQL database deployment step

Note that the Connection string is a setting we pull from an Azure Key Vault into  the release pipeline using the Azure Key Vault task. This connection string uses a login that has DDL rights on that database. It is probably possible to use the managed identies for this. But we haven't tried this yet.

And we are done!

That's it really, there is nothing more we have to do to make this work. As long as you keep your migrations backwards compatible you can now release new code with database migrations without your users even noticing it. And whenever you cause a serious bug that require a quick rollback rather than waiting for a new deploy to fix it. You can either just deploy the previous version of the code again or go into the app-service and swap the slots manually.