You should store request_uuid in (almost) every table

Automatically storing the request_uuid of the last request that has touched a table simplifies so much work:

You don’t need to do these sorts of investigations every day, but when you do, you’ll be glad that you can easily see exactly which request changed the data.

💬

I like the recommendation from YAGRI: You are gonna read it to default to storing fields like created_by and updated_at, but I think leveraging a request_uuid or trace_id is a better way of accomplishing that goal because it gives you even more debugging information.

Do your requests not have a request_uuid or trace_id?

You should fix that! Having a unique identifier per request makes instrumentation and logging much simpler. If you return that request_uuid to clients, it makes it much simpler to debug client crash reports with backend information about why a particular request received a 4xx error or anything like that.1

Do you not store information about (mutative) requests in an audit table?

You should fix that too! Aside from making some debugging tasks much easier, it can also enable some analytics work, particular in situations where we don’t have good automated event logging.

Aside from request_uuid, The essential fields that you’ll likely want in a request table are things like entity_id, route, full_path, timestamp, method, http_status, response_code, client_platform, client_version, backend_version, and request_duration.2

You may also want to store ip_address for help investigating security incidents, but that will open up data retention and access issues. In many jurisdictions (including the EU, UK, and Canada), an IP address by itself can be considered personally identifiable information, so you will likely need to store it in a separate table and have clear lifecycle and access rules for it.

So, how do we actually accomplish this?

The key piece is making this system automatic. If engineers have to think about threading through request_uuid into every function call, adding it to table defintions, and making sure it’s present in every insert or update statement, it won’t happen. Or if it does, there will be gaps. We want this to be foolproof.

I built a system to store latest_request_uuid for every table across every database at ClassDojo. It took less than a day to build, but it’s had a huge impact on overall system legibility.

  1. Update all existing tables so they have a latest_request_uuid field (and optionally a created_by_request_uuid one).3

    const table_names = await query(`show tables;`)
    
    const queries = [];
    for (const table_name of table_names) {
        queries.push(`ALTER TABLE ${table_name} ADD COLUMN latest_request_uuid binary(16) default NULL, ALGORITHM=instant;`)
    }
    const migration_query = queries.join("\nSELECT SLEEP(5);\n");
    
  2. Add a check to lock down creating any new tables without this column. This depends on having a good test environment that exactly mirrors production! (We even added a custom lint rule for this later)

    const all_tables = await query('show tables;')
    const tables_with_request_uuid = (await query(`
        SELECT DISTINCT TABLE_NAME
        FROM INFORMATION_SCHEMA.COLUMNS
        WHERE COLUMN_NAME = 'latest_request_uuid'
        AND TABLE_SCHEMA='my_schema';`)).map((row) => row.table_name);
    
    const tables_missing_column = _.difference(all_tables, tables_with_request_uuid);
    
    if (tables_missing_column.length) {
        const query_to_fix = tables_missing_column.map((table_name) => {
            return `ALTER TABLE ${table_name}
            ADD COLUMN latest_request_uuid binary(16) default NULL,
            ALGORITHM=instant`;
    
        });
        console.error(`Helpful error message here`, query_to_fix.join('\n'));
        process.exit(1);
    }
    
  3. Start writing request_uuid. What this looks like will depend on how you’re doing writes and what language you’re working in. In JS, AsyncLocalStorage makes it easy to instrument a web-server and gives you an easy way to reference the request_uuid of the current request in your ORM or query-builder.

  4. If you have any spots where you’re doing a select *, make sure the latest_request_uuid is filtered out. (If your RDMS supports invisible columns, that’s an easier way of making this happen!)

The main thing not covered by this approach is hard deletes. For many systems, hard deletes are relatively rare and have more customized audit logging.4

I’m also glossing over updates a little bit. If you have a change-data capture system for your database (like Maxwell’s Daemon), only storing the latest_request_uuid is sufficient because you’ll have a record of all changes. If you don’t have something like that in place, you might consider storing an array of all request_uuids that modified a row, but that does bloat storage.

request_uuid increases system legibility

A common software engineering piece of wisdom is that we spend more time reading code than writing it. In a similar way, I think we all tend to spend more time debugging the behavior of a system than developing new features in a system, so it’s worth putting a lot of effort into increasing the overall legibility of the system you’re working on.

Increasing the legibility of a system by adding audit tables, extra columns, admin views, and similar tooling can feel in the moment like it’s unnecessary, but over the long term, it avoids engineers needing to spend a ton of time tracing through code to attempt to understand a problem rather than having a clear picture of a user’s state.

Adding request_uuid tracking to every table was one of those changes that felt obvious in retrospect. Why had we made it so hard to tell exactly which request had changed a row in a table? It’s one of those things that you don’t need too often, but it makes life so much easier when you do need it.


  1. UUIDs come in various formats, but I like V7 UUIDs. V7 UUIDs encode the timestamp in the first part of the id, which makes them sortable and usable if you need to choose a date partition to query. (This property also makes them more performant to use as primary keys than other variants)

    const { uuidv7 } = require("uuidv7");
    const [timePortionA, timePortionB, ...rest] = uuidv7().split("-");
    const millsecondsSinceEpoch = parseInt([timePortionA, timePortionB].join(""), 16);
    console.log(`uuid was created at ${new Date(millsecondsSinceEpoch)}`);
    

    If your system has jobs that update data, you’ll want some sort of trace ID for those jobs as well. I use a custom V8 UUID format that encodes the timestamp the same way V7 UUIDs does along with a hash of the job name:

    /**
     * Generate a uuid v8 ID with timestamp, 0s, and encoded job name
     *
     * 48 bits: Date (matching uuidv7 date encoding) (2^48 = 16^12)
     * version = "8000" 8 = version 8 custom UUID https://www.ietf.org/archive/id/draft-peabody-dispatch-new-uuid-format-01.html#name-uuidv8-layout-and-bit-order
     * 0000 — more 0s to make this "pop" more
     * 64 bits: (12 hex characters) hashed job name
     */
    function generateJobTrackingUUID(jobName: string) {
        const [timePortionA, timePortionB] = uuidv7().split("-");
        return [
            timePortionA,
            timePortionB,
            "8000", // 8 is the version
            "0000", // to make this "pop" visually
            hash("sha256", jobName, "hex").slice(0, 12),
        ].join("-");
    }
    
     ↩︎
  2. Why response_code and http_status? If a route has multiple reasons that it can return a 4xx status to a client, you’ll want some way for both the client and the backend to denote the precise reason that a request got a 4xx.

    A 4xx “error” might not necessarily even be an error from the system’s perspective—think of hitting a /session route to see if a session is still fresh. We expect 400s on this route, so the internal response codes to the client ↩︎

  3. Adding an extra column (or columns) to a table does increase how much data you’re storing! It may make sense to have one or two thoughtful exceptions to these rules if adding 16 bytes per row to a table will cause problematic data growth. ↩︎

  4. One useful pattern for hard deletes is first querying by primary key to find the rows that you want to delete, storing a (partial) record of that row in a locked down audit table with appropriate lifecycle rules, and then executing the delete statement. You obviously can’t retain data indefinitely, but retaining data in backups for a week is generally permissible if disclosed in your data processing agreements. ↩︎