Working At Work: This And Other Thoughts On High Availability

Written by Ken Dale and Thomas Sobieck

Over the previous year we’ve been working to improve our overall uptime. While we aren’t prepared to offer 99.999% availability in the way a cloud provider may, we’re better prepared to meet unplanned outages. Not only is high availability making us more reliable, it means we can perform more tasks during the day.

Benefits

More resilient to outages

This is what you expect: Applications that stay online when unexpected things happen. That’s good news for everyone.

Work during the workday

A reason you may not immediately consider but has been a great benefit for us: working during the workday. If you work on systems with an expectation of core available hours that aren’t suitable for planned downtime during your typical workday, you can still take part of the system offline safely. This has enabled us to move applications to different Azure App Service Plans during the workday, which wouldn’t be feasible without another available instance to handle the incoming requests.

We can also scale our App Service Plans up or down — and, as long as we don’t scale all instances at the same time Azure Traffic Manager will stop routing traffic to the unavailable instances and favor the online ones (after it discovers they are down and the DNS TTL expires…). Sure, Azure Traffic Manager is DNS based so there’s some latency with failover. But, that can be overcome by eagerly disabling endpoint routing before the planned outage (in our case, we can use Terraform to easily stop an entire region from receiving requests).

We’re able to rebuild infrastructure, make changes, and experiment (within reason) all in production during the workday. As long as we do things in the proper order we’re OK. Sure, there is the risk of error doing things out of order but if something goes sour you’ve got a team to help who are available — not eating dinner, putting their kids to bed, or relaxing for the evening.

Challenges

Adding additional Azure Web Apps into the mix doesn’t come without additional complexity. These applications need to be able to move traffic between instances seamlessly and handle multiple instances starting up simultanously.

Shared secrets

If you need to access encrypted cookies and similar data between instances you’ll need to share the encryption/decryption capability with the other instances. This just works if you have a single instance. But, once you have a completely separate Azure Web App (not using the scale out feature) you’ll need to handle ASP.NET full framework machine keys ASP.NET Core Data Protection keys yourself.

Health checks

We needed an endpoint to know when an app is healthy and should have traffic routed to it, as needed by Azure Traffic Manager. For ASP.NET full framework we ended up creating a NuGet package to reuse ASP.NET Core health checks. See our post [Using ASP.NET Core Health Checks With ASP.NET Full Framework]({% post_url 2019-02-12-using-aspnet-core-health-checks-with-aspnet-full-framework %}) for more details.

A few more assets

With high availability comes the need for load balancing between instances and shared backend state, if needed.

Architect applications for multiple instances

Applications themselves need to be able to handle being on multiple instances. Any file system that is local to the instance shouldn’t be used (other than for temporary items, for example something generated during a request). And, database migrations and other behavior happening on startup needs to handle multiple simultaneous startups between apps, which happens when we swap all staging slots into production at the same time.

In closing

High availability doesn’t come free but we feel it has been worth the investment overall.

Reverse Engineering your Database into your ASP.NET Core Project

There's more than one way to...remove a file extension

Adding Google's reCAPTCHA To Your Form

ASP.NET Core Logger messages matter

Hide and seek with Az Blob Last Accessed Time

A Brief Intro to Azure Blob Storage Lifecycle Management

Documenting ASP.NET Core APIs with Swagger

Archive NuGet Packages from GitHub

C# URI Concatenation

Automatically import Components in Astro MDX

Tips to git good with git

Setting A Negative Value With Custom Properties

Creating A Redirect in Astro

Never Get Bit by z-index Again

Leveling Up Your Project Testing with tSQLt Unit Tests for SQL Queries

Creating A Pagination Component With Astro

Generating your own fonts with Fantasicon

Using custom elements and pinia with Vue 3

RIMdev Radio: Building with Astro

Recreating the Spotify "Like" Button

Working At Work: This And Other Thoughts On High Availability

Benefits

More resilient to outages

Work during the workday

Challenges

Shared secrets

Health checks

A few more assets

Architect applications for multiple instances

In closing

Published August 29, 2019 by

Suggested Reading

Our DevOps Journey: Release Branches, Highly Available Azure Web Apps, and Terraform

Archive NuGet Packages from GitHub

A Brief Intro to Azure Blob Storage Lifecycle Management

Hide and seek with Az Blob Last Accessed Time

ASP.NET Core Logger messages matter

Copying App Settings and Connection Strings Between Azure Web Apps

Search

Working At Work: This And Other Thoughts On High Availability

Benefits

More resilient to outages

Work during the workday

Challenges

Shared secrets

Health checks

A few more assets

Architect applications for multiple instances

In closing

Published August 29, 2019 by

Suggested Reading