Databases

Human readable resources on Paxos

Read more about Human readable resources on Paxos
4 comments
Log in to post comments
730 views

It's time to pick up our series on consistency and replication again. See consistency tag for the previous paper reviews on the topic.

I had thought I could live a happy life as a database engineer, and eventually retire, without ever having to actually read any literature on Paxos.

A scalability model for Cassandra

Read more about A scalability model for Cassandra
Log in to post comments
686 views

One thing that struck me when reading up on Cassandra is that there is a very strong mindset in the Cassandra community around linear scalability and therefore on primary key based data models. So de-normalizing your data, such as by using materialized views is considered a best practice.

However, de-normalization has some challenges of its own. Both Cassandra-managed materialized views or any other application side managed denormalization run the risk of becoming inconsistent. And of course it does mean you're multiplying your database size.

Secondary indexes in Cassandra

Read more about Secondary indexes in Cassandra
4 comments
Log in to post comments
1919 views

While most focus in Cassandra world is on the upcoming 4.0 release, or perhaps on our project to add JSON and GraphQL APIs to Cassandra, a feature that excites me personally is much more fundamental and at the core of th

RUM Conjecture for Beginners

Read more about RUM Conjecture for Beginners
Log in to post comments
717 views

The paper stating the RUM conjecture was published by a group of Harvard DASLab researchers in 2016. They also have created a more easily digestable RUM conjecture home page with graphics. Yet, in this blog post I try to describe the idea in even simpler terms than that page.

Database Benchmarking for Beginners

Read more about Database Benchmarking for Beginners
1 comment
Log in to post comments
925 views

An engineer I work with asked me for tips on what to read about database benchmarking. I told him I've learned a lot from reading Mark Callaghan's blog. Now that I think about it, articles and conference talks from Baron Scwhartz were also, or even more, fundamental early on when I was getting started.

Reasons to choose Datastax

Read more about Reasons to choose Datastax
2 comments
Log in to post comments
695 views

When I choose technologies to use, or employers to work for, my system is based on sticking with a few things I believe in. Datastax happens to tick quite a few of those boxes:

What's in a database storage engine

Read more about What's in a database storage engine
2 comments
Log in to post comments
2151 views

I overheard - over-read, really - an internet discussion about database storage engines. The discussion was about what functionality is considered part of a storage engine, and what functionality is in the common parts of the database server. My first reaction was something like "how isn't this obvious?" Then I realized for a lot of the database functionality it isn't obvious at all and the answer really is that it could be either way.

My conference talks on Youtube

Read more about My conference talks on Youtube
Log in to post comments
551 views

My kids watch a lot of youtube. They follow the famous Finnish youtubers every week. At some point my son had realized there are many videos on youtube with his father doing conference talks. Some of them have a thousand viewers. I've never gotten so much adoration and respect from my son as that day!

I've created a playlist of all my conference talks that have been published on youtube.

Paper review: Strong and Efficient Consistency with Consistency-Aware Durability

Read more about Paper review: Strong and Efficient Consistency with Consistency-Aware Durability
8 comments
Log in to post comments
1885 views

Mark Callaghan pointed me to a paper for my comments: Strong and Efficient Consistency with Consistency-Aware Durability by Ganesan, Alagappan and Arpaci-Dusseau ^2. It won Best Paper award at the Usenix Fast '20 conference. The paper presents a new consistency level for distributed databases where reads are causally consistent with other reads but not (necessarily) with writes.

My comments are mostly on section 2 of the paper, which describes current state of the art and a motivation for their work.

Writing a data loader for database benchmarks

Read more about Writing a data loader for database benchmarks
4 comments
Log in to post comments
896 views

A task that I've done many times in my career in databases is to load data into a database as a first step in some benchmark. To do it efficiently you want to use multiple threads. Dividing the work onto many threads requires good comprehension of third grade math, yet can be surprisingly hard to get right.

The typical setup is often like this:

The benchmark framework launches N independent threads. For example in Sysbench these are completely isolated Lua environments with no shared data structures or communication possible between the threads.
Each thread gets as input its thread id i and the total number of threads launched N.

RSS

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube

OpenLife.cc

Open Source and Databases

Databases

Human readable resources on Paxos

A scalability model for Cassandra

Secondary indexes in Cassandra

RUM Conjecture for Beginners

Database Benchmarking for Beginners

Reasons to choose Datastax

What's in a database storage engine

My conference talks on Youtube

Paper review: Strong and Efficient Consistency with Consistency-Aware Durability

Writing a data loader for database benchmarks

Search

Recent blog posts

Recent comments

All time:

Last viewed: