As someone may have noticed, I recently wrote a trilogy on how to dive into the MySQL Cluster source code. Unfortunately my overtures towards the MySQL Cluster source code ended up being only a look-but-don't-touch affair, as I failed to actually get to touch her internals with my text editor. Even so, in this post I'd like to tell about the background to my love affair with this beauty, by relating to some benchmarks I've been working on together with my customers.
Oh, and I'd like to apologize already, that I cannot mention where these benchmarks were done, what the schema looked like and the exact numbers. If you want that kind of real benchmarks, you should read Mikael's blog, or watch the slides from this webinar. (Then compare those results to BigDBAHead's SSD RAID magic with InnoDB, both are the same DBT2 benchmark.)
The history of MySQL Cluster was to be an efficient and scale-outable in-memory datastore for key-value pairs. (In that context, it is funny that it is regularly neglected in articles about the new cloud-craze of distributed hash / key-value databases. Have we been focusing too much on messaging this as a telco platform?) Last year an important milestone was achieved, when Mikael finally achieved linear scalability also for the transactional DBT2 benchmark. One thing that is not so great yet are queries with non-trivial joins. But even that is being worked on as we speak.
So I've been working on one really interesting project with a customer. Due to the sensitive times we live in, I won't name any competitor names here, let's just say that MySQL Cluster is there challenging the "Incumbent Database".
The schema is relatively simple, some tables that hold user data. Could be phone numbers, could be emails, could be your social network site really. The users are (sometimes) part of a larger hierarchy and you need to fetch the full chain of of the hierarchy in that case. Did I say MySQL Cluster is inefficient with JOINs? Yes, and this worried me at the outset of this project.
The performance requirements are really though, as is usual in the telco world, with a high percentage of writes: 33,333 queries / sec, of which 50% are writes! The latency requirement... ok I need to be a bit fuzzy here: low tens of milliseconds for a group of 3 queries. I have to admire the consultants of the Incumbent Database who set up the current system to achieve that kind of write throughput: The Active-Active database is stored on a storage device with (fuzzy again...) slightly less than 200 disk spindles, formatted with a partition that resides only on the 10% outer surface of each disk, ie the most efficient surface of the disk. The redo log is written onto a Texas Memory Systems RamSan-500, ie a "fake hard disk" that is really a box with lots of RAM and a battery. The servers are Sparc CMT servers and remained so for the MySQL Cluster setup too (made possible by the multithreading in version 7.0)
And the customer was really happy with that level of performance.
Then came MySQL Cluster. Ok, so I have to admit, we have spent some weeks tuning, it's not like you get this just out of the box. We went through various ideas to denormalize the schema to make it more suitable to MySQL Cluster (ie to avoid JOINs). In the end, a compromise was chosen, no denormalization and allow slightly worse latency. Then we worked on the partitioning, benchmarked, re-partitioned, tuned configuration parameters, benchmarks, added more datanode processes (scale-out), benchmarked. And in the end we achieved the goal! Ok, slightly worse latency (but same order of magnitude, depends on size of hierarchy) due to so many joins, but same throughput.
And the hardware? Slightly newer pair of Sparc CMT servers. At first the MySQL Cluster was using just one local hard disk on each server. As we reached toward the goal, this became overloaded and caused some interesting symptoms. So the customer switched to a RAID stripe with 6 disks per server. This has worked fine. Crazy big storage - gone. RamSan - gone. If the customer chooses MySQL Cluster, they will save millions in hardware costs (never mind the small savings in license).
This has been a really interesting project, and this is why a love MySQL Cluster. I happen to think it is a really superior technology which has a bright future. Scale-out using standard servers with standard hard disks - no crazy big boxes needed.