NoSQL

A scalability model for Cassandra

One thing that struck me when reading up on Cassandra is that there is a very strong mindset in the Cassandra community around linear scalability and therefore on primary key based data models. So de-normalizing your data, such as by using materialized views is considered a best practice.

However, de-normalization has some challenges of its own. Both Cassandra-managed materialized views or any other application side managed denormalization run the risk of becoming inconsistent. And of course it does mean you're multiplying your database size.

20 years later, what's left of the CAP theorem?

The CAP theorem was published in (party like it's...) 1999: Fox Armando, Brewer Eric A: Harvest, Yield, and Scalable Tolerant Systems.

Since its publication it has provided a beacon and rallying cry around which web scale distributed databases could be built and debated. It(s interpretation) has also evolved. Quite quickly the original 1999 formulation was abandoned, and from there it has further eroded as real world database implementations have provided ever more finer grained trade offs for navigating the space that - after all - was correctly mapped out by the CAP theorem.

Pick ANY two? Really?

Slides from Failover or not Failover, that is the question

Below are the slides from my last talk at this Percona Live Worldwide MySQL Conference. The idea for this talk was proposed by my co-presenter Massimo Brignoli and goes back to a debate on this topic that went through the MySQL blogosphere during last Autumn - which in itself was sparked by an outstanding retrospective published about a MySQL failure at Github.

Designing a HTTP JSON database api

A few weeks ago I blogged about the HTTP JSON api in Drizzle. (See also a small demo app using it.) In this post I want to elaborate a little on the design decisions taken. (One reason to do this is to provide a foundation for future work, especially in the form of a GSoC project.)

Looking around: MongoDB, CouchDB, Metabase

Simple GUI to edit JSON records in Drizzle

So yesterday I introduced the newly committed HTTP JSON key-value interface in Drizzle. The next step of course is to create some simple application that would use this to store data, this serves both as an example use case as well as for myself to get the feeling for whether this makes sense as a programming paradigm.

Personally, I have been a fan of the schemaless key-value approach ever since I graduated university and started doing projects with dozens of tables and hundreds of columns in total. Especially in small projects I always found the array structures in languages like PHP and Perl and Python to be very flexible to develop with. As I was developing and realized I need a new variable or new data field somewhere, it was straightforward to just toss a new key-value into the array and continue with writing code. No need to go back and edit some class definition. If I ever needed to find out what is available in some struct, I could always do dump_var($obj) to find out. Even large projects like Drupal get along with this model very well.

Drizzle JSON HTTP interface now with key-value support

The thing I really like with open source is the feeling you get when people just show up from nowhere and do great things to some code you originally wrote. Thanks to this miracle, I can now also present to you version 0.2 of the Drizzle JSON HTTP support, featuring a "pure JSON key-value API" in addition to the original "SQL over HTTP" API in 0.1 version. Let's recap what happened:

  1. At Drizzle Day 2011, I proposed that Drizzle should make available a JSON NoSQL interface. Stewart took the bait and published json_server 0.1 a week later. This API still uses SQL, it's just that the client protocol is HTTP and JSON, into which the SQL is embedded. So I suppose it's not as sexy as real NoSQL, but pretty cool nevertheless.
  2. At the end of last Summer I had a lot of dead time so I started playing with Stewart's code to see what I could do. I added a new API in addition to Stewart's SQL-over-HTTP API that supports key-value operations in pure JSON, similar to what you see in CouchDB, MongoDB or Metabase. I got it working quite well, however, I never implemented a DELETE command, because I then drifted off to more important tasks, such as revamping the Drizzle version numbering and bringing DEB and RPM packaging up to date.
  3. Last week a new but very promising Drizzle hacker called Mohit came by, looking for things he could do. He had already fixed a simple low-hanging-bug and wanted something more. Since he was interested in the JSON API, I asked if he wants to finish the missing piece. With my helpful advice of "there is no documentation but if you look at the demo GUI you'll probably figure it out, then just look at the code for POST and implement DELETE instead". I was afraid that wasn't really helpful, but I was busy that day. No problem, the next day Mohit had pushed working DELETE implementation. The day after that he implemented the final missing piece, a CREATE TABLE functionality. I was both impressed and excited.

Presentation: Databases and the Cloud (and why it is more difficult for databases)

A week ago I again had the pleasure to give a guest lecture at Tampere University of Technology. I've visited them the first time when I worked as MySQL pre-sales in Sun.

To be trendy, I of course had to talk about the cloud. It turns out every section has the subtitle "...and why it is more difficult for databases". I also rightfully claim to have invented the NoSQL key-value development model in 2005.

NoSQL performance numbers - MySQL and Redis

Links to performance numbers posted wrt various NoSQL solutions:

A top 20 global website announced they have migrated from MySQL to Redis. There will be a keynote and everything. It doesn't say how big the Redis Cluster is, but they serve 100M pages / day, and clock 300k Redis queries / second.
https://groups.google.com/forum/?fromgroups#!topic/redis-db/d4QcWV0p-YM

Btw, they mention that MySQL remains as the master data store from which the Redis indexes are generated.
(The reason I don't mention the name of this Redis user is simply I feat my mom is sometimes reading my blog...)

Stored procedures in JavaScript? (My Drizzle repository can do it)

Just wanted to record for the history books that:


drizzle> select js_eval('var d = new Date(); "Drizzle started running JavaScript at: " + d;')\g
+----------------------------------------------------------------------------------+
| js_eval('var d = new Date(); "Drizzle started running JavaScript at: " + d;') |
+----------------------------------------------------------------------------------+
| Drizzle started running JavaScript at: Mon Aug 29 2011 00:23:31 GMT+0300 (EEST) |
+----------------------------------------------------------------------------------+
1 row in set (0.001792 sec)

I will push this onto launchpad tomorrow, after a good nights sleep and final code cleanups.

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube