Graceful web server shutdowns behind HAProxy

Fully graceful incremental deploys are hard. Any number of events can deliver brief spates of 5xx errors to your clients, and getting all of the pieces right isn't trivial. And, if something isn't quite right, it can be hard to detect problems from a few seconds of downtime on a server over the course of a deploy.

If you see 5xx errors from HAProxy because it was unable to connect to a server that it thought was available or because a server stopped sending data, it often points to issues with graceful shutdown logic. If you're at a company that deploys rarely and that isn't under heavy load, focusing on these few seconds might seem a bit crazy. So why should you care about those few seconds?

There's a real sense of satisfaction that comes with making something solid and dependable that users can rely on — not to mention a few practical reasons:

It's simplest to alert on low error rates (especially 0!). If you have an "expected error rate" that needs to handle deploys, it may make it trickier to discover problems, or you may get false positive error alerts.
"Treat servers like cattle, not pets": Handling shutdowns gracefully allows you to safely rotate instances and put servers on spot instances.
Graceful deploys let you easily scale in and out as traffic changes over the course of a day without sending 5xxs to clients

At ClassDojo, we deploy 10 to 20 times a day, and we need to scale in and out to handle traffic throughout the school day. Having graceful web server shutdowns behind HAProxy lets us make that happen.

What does a graceful shutdown look like?

Let's not worry about how we update HAProxy configuration, or how we actually get new code deployed to an instance (or a new container running on K8s or Nomad or wherever). Our HAProxy is happily running with this configuration:

backend the-trunk
  mode http
  option httpchk GET /status
  timeout server 20s
  rspadd X-Clacks-Overhead:\ GNU\ Terry\ Pratchett

  server tower-0:version1 10.1.0.127:8080 minconn 1 maxconn 10 rise 2 fall 2 check inter 2s
  server tower-1:version2 10.1.0.127:8081 minconn 1 maxconn 10 rise 2 fall 2 check inter 2s
  server tower-2:version2 10.1.0.127:8082 minconn 1 maxconn 10 rise 2 fall 2 check inter 2s

option httpchk GET /status: make http GET requests to the /status route on the running server. (note: you can do this with a tcp connection check to a port, but I like the clarity and simplicity of HTTP. Doing these checks over tcp is much cheaper, and a good choice if you're running load balancers under heavier load)
server tower-0:version1 10.1.0.127:8080: where the web-server is
rise 2: require 2 200s to mark a server as up
fall 2: require 2 failing 4xx or 5xx errors to mark a server as down
check inter 2s: check /status every 2 seconds

Given this configuration, our shutdown logic should look like:

Send a signal (SIGTERM) to one of the web server processes.
The web server updates its /status route to start returning 503s, indicating that it's down.
After two failing checks, HAProxy stops sending new traffic to the server. The server may still be handling requests.
The server waits for all of the remaining requests to complete. It can then safely clean up and shut down. If any requests aren't complete by the time the server is shutting down, it should log an error.

Here's some simplified Node.js pseudocode that illustrates what those steps might look like. (We know there are more maintainable ways of writing much of this code: this is intended as illustration only.)

const app = require('express')();
app.listen(8081);

const haproxyStatus = {
  outstandingRequestCount: 0,
  // 'up' may start off false if you need to do any setup before serving traffic
  up: true,
}

// tracking outstanding requests lets us see whether it's safe to shut down
app.use((req, res, next) => {
  haproxyStatus.outstandingRequestCount++;
  res.once("finish", () => haproxyStatus.outstandingRequestCount--);
  next();
});

// health check route
let reportUpToHaproxy = true;
app.get('/status', (req, res) => haproxyStatus.up ? res.sendStatus(200) : res.sendStatus(503);

// regular routes
app.get('/small_gods', (req, res) => res.send("Time is a drug. Too much of it kills you. — Terry Pratchett"));
app.get('/jingo', (req, res) => res.send("Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life. — Terry Pratchett"));

function delay(ms: number) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

async function drainRequests () {
  if (haproxyStatus.up) {
    throw new Error("We cannot drainRequests until HAProxy is aware we're down");
  }
  while (true) {
    if (outstandingRequestCount === 0) return;
    await delay(100);
  }
}

async function reportDownToHAProxy() {
  // we start reporting that we're down to HAProxy
  // but it takes time for HAProxy to run its health checks `fall` times
  haproxyStatus.up = false;

  // server tower-0 10.1.0.127:8080 minconn 1 maxconn 10 rise 2 fall 2 check inter 2s port 8081
  const CHECK_INTER = 2_000; // check inter 2s
  const FALL_COUNT = 2; // fall 2

  // (note: if you have a single load balancer, you could count the number of `/status` requests that you've responded to to determine whether the load balancer is aware that you're down.
  // This works, but it gets more complicated when you have multiple load balancers that you need to wait for, and seems a little harder than just calculating a time that's _probably_ safe)

  await delay(FALL_COUNT * CHECK_INTER);
}

async function cleanUp () {/*any cleanup work you have*/}

async function gracefulShutdown () {
  await reportDownToHAProxy();
  // at this point, HAProxy should be aware that this server is down and will stop sending us requests
  // we likely still have a few requests outstanding, and `outstandingRequests` will be > 0
  // (note: we're not worrying about connection timeouts here. If we were, we'd add time to our delay)

  // timeout server 20s
  const MAX_REQUEST_TIME = 20_000;
  await Promise.race(
    drainRequests(),
    delay(MAX_REQUEST_TIME),
  );

  // after draining requests, do any other cleanup work necessary to make exiting safe, such as closing database connections
  await Promise.race(
    cleanUp(),
    delay(5_000),
  )

  // at this point, all requests should be complete (or canceled by HAProxy)
  // and we should naturally exit because there are no longer any listeners or callbacks
  setTimeout(() => {
    console.error("We have to force exit because we didn't clean up all callbacks (unref means we don't need to worry about this setTimeout)")
    process.exit(1);
  }, 1000).unref()
}

// our deployment process sends a signal to the running process or container
// to let it know that it's time to shut down
process.on("SIGTERM", gracefulShutdown);

How can you tell whether things are going wrong?

One of the most important parts of software engineering is knowing when things are going wrong: thankfully, HAProxy logs make it pretty clear when something is broken with web server shutdowns. HAProxy logs have a ton of information in them, but we're only concerned with requests that terminate with sC-- or S*--. s indicates there was an HAProxy -> web-server connection error, and S indicates the server broke the TCP connection. The next character gives information about where in the request lifecycle the connection error happened. The HAProxy docs on termination states are incredibly useful for understanding these termination problems.

Example log with SH-- termination state:

10.1.0.127:8080 [01/Jul/2021:00:99:00.000] http-in the-trunk/tower-1:version0 0/0/0/-1/3022 502 0 - - SH-- 0/0/0/0/0 0/0 "GET /jingo HTTP/1.1"

Shutting down this post

This style of incremental graceful web-server deploy isn't the only way of tackling this problem. Some engineering teams only deploy rarely, and accept seconds or minutes of downtime during deploys. Other teams use blue-green deploys to switch over all web servers at once. Our engineering team values the speed, stability, reliability, and autoscaling that our graceful web-server shutdowns enable. And, while it takes a bit of code to make happen, the fundamental idea is simple: wait until HAProxy is aware that a server is down, and then start waiting for any outstanding requests to complete.