open life blog

Paper review: Strong and Efficient Consistency with Consistency-Aware Durability

Mark Callaghan pointed me to a paper for my comments: Strong and Efficient Consistency with Consistency-Aware Durability by Ganesan, Alagappan and Arpaci-Dusseau ^2. It won Best Paper award at the Usenix Fast '20 conference. The paper presents a new consistency level for distributed databases where reads are causally consistent with other reads but not (necessarily) with writes.

My comments are mostly on section 2 of the paper, which describes current state of the art and a motivation for their work.

Writing a data loader for database benchmarks

A task that I've done many times in my career in databases is to load data into a database as a first step in some benchmark. To do it efficiently you want to use multiple threads. Dividing the work onto many threads requires good comprehension of third grade math, yet can be surprisingly hard to get right.

The typical setup is often like this:

  1. The benchmark framework launches N independent threads. For example in Sysbench these are completely isolated Lua environments with no shared data structures or communication possible between the threads.
  2. Each thread gets as input its thread id i and the total number of threads launched N.

Automatic retries in MongoDB

At work we were discussing whether MongoDB will retry operations in some circumstances or whether the client needs to be prepared to do so. After a while we realized different participants in the discussion were discussing different retries.

So I sat down to get to the bottom of all the retries that can happen in MongoDB, and write a blog post about them. But after googling a bit it turns out someone has already written that blog post, so this will be a short post for me linking to other posts.

Retries by the driver

If you set retryWrites=true in your MongoDB connection string, then the driver will automatically retry some write operations for some types of failures. Ok, can I be more specific? Yes I can...

Let's rewrite everything from scratch (Drizzle eulogy)

Earlier this year I performed my last act as Drizzle liaison to SPI by requesting that Drizzle be removed from the list of active SPI member projects and that about 6000 USD of donated funds be moved to the SPI general fund.

Drizzle project started in 2008, when Brian Aker and a few other MySQL employees were moved to Sun Microsystem's CTO labs. The background to why there was demand for such a project was in my opinion twofold:

Node failures in Serializeable Snapshot Isolation

Previously in this series: Reading about Serializeable Snapshot Isolation.

Last week I took a deep dive into articles on Serializeable Snapshot Isolation. It ended on a sad note, as I learned that to extended SSI to a sharded database using 2PC for distributed transactions1, there is a need to persist - which means replicate - all read sets in addition to all writes.

This conclusion has been bothering me, so before diving into other papers on distributed serializeable transactions, I wanted to understand better what exactly happens in SSI when a node (shard) fails. This blog doesn't introduce any new papers, just more details. And more speculation.

  • 1. for example, MongoDB

Reading about Serializeable Snapshot Isolation

Previously in this series: Toward strict consistency in distributed databases.

Apparently, when Snapshot Isolation was invented, it was received with great enthusiasm, since it avoids all the anomalies familiar from the SQL standard. It was only as time went by that people discovered that SI had some completely new anomalies and therefore wasn't equivalent to serializeable isolation. (This I learned from the PostgreSQL paper below.)

Toward strict consistency in distributed databases

July is vacation time in Finland. Which means it is a great time to read blogs and papers on consistency in distributed databases! This and following blogs are my reading notes and thoughts. Mostly for my own benefit but if you benefit too, that's great. Note also that my ambition is not to appear like an authority on consistency. I just write down questions I had and things I learned. If you have comments, feel free to write some below. That's part of the point of blogging this!

About the bookAbout this siteAcademicAmazonBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube