open life blog

Automated System Performance Testing at MongoDB - the end of a trilogy

We are pleased to announce that our paper Automated System Performance Testing at MongoDB will be presented at DBTest 2020. (A workshop in conjunction with SIGMOD/PODS.) It is available today on

This paper presents the framework we developed to do automated performance testing on realistic MongoDB clusters. We have used and evolved this system to run hundreds of benchmarks every day as part of our Continuous Integration system. In conjunction with publishing this paper, we have also finally open sourced the Python code to this framework. The framework is called Distributed Systems Infrastructure 2.0, or DSI for short.

My conference talks on Youtube

My kids watch a lot of youtube. They follow the famous Finnish youtubers every week. At some point my son had realized there are many videos on youtube with his father doing conference talks. Some of them have a thousand viewers. I've never gotten so much adoration and respect from my son as that day!

I've created a playlist of all my conference talks that have been published on youtube.

Paper review: Strong and Efficient Consistency with Consistency-Aware Durability

Mark Callaghan pointed me to a paper for my comments: Strong and Efficient Consistency with Consistency-Aware Durability by Ganesan, Alagappan and Arpaci-Dusseau ^2. It won Best Paper award at the Usenix Fast '20 conference. The paper presents a new consistency level for distributed databases where reads are causally consistent with other reads but not (necessarily) with writes.

My comments are mostly on section 2 of the paper, which describes current state of the art and a motivation for their work.

Writing a data loader for database benchmarks

A task that I've done many times in my career in databases is to load data into a database as a first step in some benchmark. To do it efficiently you want to use multiple threads. Dividing the work onto many threads requires good comprehension of third grade math, yet can be surprisingly hard to get right.

The typical setup is often like this:

  1. The benchmark framework launches N independent threads. For example in Sysbench these are completely isolated Lua environments with no shared data structures or communication possible between the threads.
  2. Each thread gets as input its thread id i and the total number of threads launched N.

Automatic retries in MongoDB

At work we were discussing whether MongoDB will retry operations in some circumstances or whether the client needs to be prepared to do so. After a while we realized different participants in the discussion were discussing different retries.

So I sat down to get to the bottom of all the retries that can happen in MongoDB, and write a blog post about them. But after googling a bit it turns out someone has already written that blog post, so this will be a short post for me linking to other posts.

Retries by the driver

If you set retryWrites=true in your MongoDB connection string, then the driver will automatically retry some write operations for some types of failures. Ok, can I be more specific? Yes I can...

Let's rewrite everything from scratch (Drizzle eulogy)

Earlier this year I performed my last act as Drizzle liaison to SPI by requesting that Drizzle be removed from the list of active SPI member projects and that about 6000 USD of donated funds be moved to the SPI general fund.

Drizzle project started in 2008, when Brian Aker and a few other MySQL employees were moved to Sun Microsystem's CTO labs. The background to why there was demand for such a project was in my opinion twofold:

Node failures in Serializeable Snapshot Isolation

Previously in this series: Reading about Serializeable Snapshot Isolation.

Last week I took a deep dive into articles on Serializeable Snapshot Isolation. It ended on a sad note, as I learned that to extended SSI to a sharded database using 2PC for distributed transactions1 , there is a need to persist - which means replicate - all read sets in addition to all writes.

This conclusion has been bothering me, so before diving into other papers on distributed serializeable transactions, I wanted to understand better what exactly happens in SSI when a node (shard) fails. This blog doesn't introduce any new papers, just more details. And more speculation.

  • 1for example, MongoDB
