Jonathan Levin has an excellent blog post titled How to make MySQL cool again. It is almost word for word something I've wanted to write for a long time. Now I don't need to, thanks Jonathan.
Once again Blogger failed to post my comments to his site, so I will make some comments as a new post of my own instead. Jonathan actually lists things that exist already but isn't getting used enough. My list contains also a few things that I don't know if they exist or not.
This is an excellent post! I've been thinking of writing exactly these points for a while, thanks to you, now I don't have to. All of what you write are low hanging fruits that basically exist already in some form.
I have some things to add to this list:
Lightweight MapReduce = bashreduce:
br -r "sort" output
Breat for one liners against a set of servers, mysql or otherwise. But of course use Gearman for serious work.
MySQL Cluster does exactly this, but we need it for InnoDB. Spider Engine should do this? Have not previously heard of Scalebase mentioned in previous comment (in Jonathan's blog). Both Ulf Wendel and Justin Swanhart have separately been blogging about experiments with the MySQL connectors (in PHP, first is with plugin, other with wrapper) that could be used for this.
Transparent re-sharding aka online add node.
MySQL Cluster does this. Need also for InnoDB. Spider Engine could probably do this with "very little downtime" as an ALTER TABLE? Implementing an automated solution for this would be kinda cool! The connector based experiments should easily lend themselves to this.
Several replication improvements: paralell replication, multi-source. Like you say Tungsten has these so it is a good addon to standard MySQL replication. But we also want
- synchronous master-master replication.
- global transaction id's (tungsten).
- which gives you easier deployment of new slaves and promoting a new master.
There are various solutions out there, in the Google patch, via Galera replication etc...
Better support for schema-less development. With HandlerSocket you can already get a very fast key value store: CREATE TABLE t (key INT, value BLOB) and there you go. But MongoDB and some others take this further and you can actually have something like secondary indexes into that BLOB. MariaDB's virtual columns and dynamic columns are interesting here (I've blogged about this as a concept: https://openlife.cc/blogs/2010/october/mariadb-52-using-mariadb-column-…. Roland Bouman made some interesting comments to those blogs.)
The thing still completely missing is to embed a JSON parser into MySQL/MariaDB, that would allow you to create the secondary indexes in a user friendly way. My experimentations with XML documents suggest that the performance penalty for using XPath and the existing XML parser was very high compared to using SUBSTR(). Unless the performance of a XML/JSON parser can be improved to the level of SUBSTR(), this again suggests that the task of maintaining the secondary indexes could be pushed to the client side instead. (Eg when inserting/updating a record, the client extracts the values needed for secondary indexes and writes them as separate columns that are indexed. From the DB point of view these are then just normal indexed columns, even if they are derived from the JSON document that is stored in the BLOB.)