open life blog

What's in a database storage engine

I overheard - over-read, really - an internet discussion about database storage engines. The discussion was about what functionality is considered part of a storage engine, and what functionality is in the common parts of the database server. My first reaction was something like "how isn't this obvious?" Then I realized for a lot of the database functionality it isn't obvious at all and the answer really is that it could be either way.

Microverse guest lecture: How to build a career working in Open Source (and also remotely)

In 8th grade, our home economics class got an assignment to create a scrapbook describing our adult life. Students would cut and paste - I mean literally, with scissors - pictures from magazines to create a portfolio with their 3 bedroom houses, one or two cars, a dog, and so on.

I of course did it a bit differently. I spent hours drawing a log cabin that my future family would live in. The cabin was in the wilderness in Lapland, where I operated some tourism related company as a family business. This was a few years before I would hear about the internet, or own a cell phone. But in my dream future I used a fax and satellite phone to operate my business. Btw, it was also years before I heard the word "startup" :-D

Automated System Performance Testing at MongoDB - the end of a trilogy

We are pleased to announce that our paper Automated System Performance Testing at MongoDB will be presented at DBTest 2020. (A workshop in conjunction with SIGMOD/PODS.) It is available today on

This paper presents the framework we developed to do automated performance testing on realistic MongoDB clusters. We have used and evolved this system to run hundreds of benchmarks every day as part of our Continuous Integration system. In conjunction with publishing this paper, we have also finally open sourced the Python code to this framework. The framework is called Distributed Systems Infrastructure 2.0, or DSI for short.

My conference talks on Youtube

My kids watch a lot of youtube. They follow the famous Finnish youtubers every week. At some point my son had realized there are many videos on youtube with his father doing conference talks. Some of them have a thousand viewers. I've never gotten so much adoration and respect from my son as that day!

I've created a playlist of all my conference talks that have been published on youtube.

Paper review: Strong and Efficient Consistency with Consistency-Aware Durability

Mark Callaghan pointed me to a paper for my comments: Strong and Efficient Consistency with Consistency-Aware Durability by Ganesan, Alagappan and Arpaci-Dusseau ^2. It won Best Paper award at the Usenix Fast '20 conference. The paper presents a new consistency level for distributed databases where reads are causally consistent with other reads but not (necessarily) with writes.

My comments are mostly on section 2 of the paper, which describes current state of the art and a motivation for their work.

Writing a data loader for database benchmarks

A task that I've done many times in my career in databases is to load data into a database as a first step in some benchmark. To do it efficiently you want to use multiple threads. Dividing the work onto many threads requires good comprehension of third grade math, yet can be surprisingly hard to get right.

The typical setup is often like this:

  1. The benchmark framework launches N independent threads. For example in Sysbench these are completely isolated Lua environments with no shared data structures or communication possible between the threads.
  2. Each thread gets as input its thread id i and the total number of threads launched N.
About the bookAbout this siteAcademicAmazonBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube