MongoDB Performance is Disk-graceful

These are all my views, not the views of my employer. A less inflammatory thesis might be "MongoDB requires different schema patterns & understanding of its storage engine to optimize performance"... but that's less fun

MongoDB does not scale well. Yes, you can easily shard a database. Yes, average write speed can be fast. Yes, there are schema patterns you can use to get impressive read performance. But, fundamentally, I've found that MongoDB is much harder to scale than traditional relational databases for read heavy workloads, that it has much more unpredictable tail-end latency, and that it requires more computing resources, especially disk and RAM, to get the same level of performance that you get from relational databases. In this post, I'm going to focus on MongoDB's lack of clustered indexes, and the problems that causes for organizing and optimizing performance for collections of large lists of things.

Should you care about this performance difference? Probably not! The MongoDB developer experience is great in a lot of ways, and the money a company spends on a database might be tiny compared to other costs. If a document data-store is a good fit for your needs, the benefits you gain in increased developer productivity will outweigh any scalability & performance drawbacks.

MongoDB collections aren't index-organized

In storage engines like InnoDB, rows for a table live in an "index-organized b-tree" (also called a "clustered index"). Similar primary keys are likely to be in similar locations on disk. This means that if I have a table message with a primary key of channelId, messageId, messages in the same channel will be located in the same on-disk pages. Percona's excellent article, Tuning InnoDB Primary Keys is a good introduction to how important controlling the primary key of an index organized table can be.

MongoDB does not support index-organized collections. It instead uses an internal auto-increment recordId as a primary key (not the auto-generated _id). There's no way to tell MongoDB that documents should be organized near each other on disk except by changing how you structure data so that it's stored in a single document. To store messages near each other on disk, you wouldn't be able to store them as a collection of messages: instead, you'd want to have a collection of channel_mesage that would likely look something like {channelId: ObjectId, messages: Array<Message>, pageNumber: number}.

Controlling the primary key of an index-organized table is an amazing tool to optimize read performance. Without control of the primary key, your data can end up scattered all over the disk. If you're building a messaging app, you probably want all of the messages in a shared channel to be in roughly the same spot on disk. If you're storing portfolio posts for a class, you probably want all of a class's portfolio posts to be in roughly the same spot on disk. Any time you're dealing with large lists, you probably don't want your database's hard drive to go on an epic treasure hunt to find all of them.

SSD random I/O is as fast as sequential, but documents are much smaller than 4kb

SSDs make disk access a ton faster, and they even make random I/O almost as fast as sequential I/O. But, when we talk about random I/O for a hard-drive, we're talking about the number of (normally 4 kilobyte) pages that we can read into memory. That's a lot of data! Most documents I've dealt with are in the 50-200 byte range: Dates take 12 bytes, Mongo ObjectIds are 12 bytes, and this whole sentence took ~150 bytes. If you have 50 byte documents, that means you can fit 80 of these documents on a page. If you need to look up 1000 documents, that could be ~1000 page reads if they're not organized by primary key, or it could be ~15 page reads if they're organized by an appropriate primary key.

Improving the page layout not only helps reduce disk seeks, it also makes the file system file cache more effective because the working set fits in fewer on-disk pages, so the pages that the file-system cache is able to keep more useful pages in memory.

We're now roughly 4 kilobytes of our way through this post. You can store a lot in a 4 kilobyte disk page!

MongoDB schema patterns can help

There are schema design patterns that you can use to work around the lack of primary key control. Say you have a list of messages in a channel between users. Rather than having a document per-message, you can instead have ChannelMessage documents that you push new messages into: {_id: channelId, messages: [{messageId, entityId, "Mongo: a discarded item that is picked up, retrieved, and rescued"}]}. While this solves performance issues, it complicates the code that you need to write to deal with messages. In a relational database, creating this table with a primary key of (channelId, messageId) completely solves this problem, and then lets you write simple code to operate on individual messages.

Apart from creating grouping documents, it's relatively common to denormalize data to avoid extra look-ups, and to add some sort of cache to many of these queries. Caching can help performance, but caching & denormalizing should be a last resort.

There are other crazier things you can do to get better performance: if you delete and recreate a document with the exact same data in a transaction, you can generate a new primary key. If you'd like a faux primary key, you can delete and recreate documents in the correct order to hopefully put them close together on disk. If you do that nightly or weekly, you can reduce the number of required disk page reads. I wouldn't recommend doing this unless you're in a truly desperate situation.

Checkpoints need extra disk i/o

When you make a MongoDB write, you're not actually writing a document to disk. Instead, that insert or update gets written to an on-disk journal every 100ms (unless you specify j: true... which I'd recommend doing for any data you care about). Every 60s, MongoDB writes a checkpoint to disk. If you have lots of updates, this can be a decent amount of writes, and it can max out disk i/o. Unlike many other databases, MongoDB isn't doing writes as they happen: this can perform well if you're often updating the same document, but it does mean that when provisioning disk i/o, you need to have enough headroom to support these writing spikes. Otherwise, you'll see mysterious and surprising read latency spikes every few minutes.

The cost of MongoDB performance

All of this performance talk is not to say that MongoDB is the wrong database for a company. It's just a database that can cost quite a bit to get good performance out of when you're at scale: you either need to provision quite a bit of disk IOPS and RAM, or you need to make changes to your schemas. As long as you're willing to pay those costs, MongoDB can be a good choice.