Why to choose a cloud service, and which one

This is the second part in a series of posts about how the MepSQL packages were built. In part 1 I evaluated OpenSuse Build System and Launchpad PPA and ended up concluding that running your own BuildBot system is the best choice, as those public services didn't provide any facility to test their packages.

This brings us to the next topic: As I don't possess any servers, should I buy one (or more) or should I try out the cloud services? If yes, should I use Amazon EC2 or something else?

Let's look at costs:

Alternative 1a: Cheap server

I could buy a single CPU server, with 1GB RAM for 350-500 EUR. I have an ADSL connection that even allows me to use public IP addresses. If I wanted to use the least possible amount of money, I'd choose this option.

This server is less powerful (but also much cheaper) than my laptop. Builds will take a long time. Development will slow down as each iteration of building and testing takes longer. (2x-5x longer, by estimate.)

Alternative 1b: Bigger server

A quad core server, with 4 GB RAM would cost around 2500 EUR. This I could afford, but it is quite a large upfront investment in a project I don't know what the future will be. What if I later want to abandon MepSQL for something else? Or what if MepSQL becomes popular and someone offers to donate a server to the project? Then I'm sitting with a 2500 EUR server doing nothing in my basement.

Alternative 2: Cloud

In many ways, using a cloud service was ideal for this project:

  1. It's a good option to get started with minimal investment, you can always buy your own server later.
  2. You can provision a few servers in the cloud immediately, don't have to wait a few days for shipping, don't have to spend time installing Linux...
  3. Space considerations: Don't need to have a server in the living room / Don't need to draw cabling down into the basement.
  4. A build server is a batch job: If I had my own server, it would sit there doing nothing most of the time. In the cloud I can start and stop my servers as needed, and don't pay anything for the unused time.
  5. Can boost capacity when needed: I'll want to build and run tests on a number of platforms - a few dozens ultimately - to support all versions of Linux, Windows, Solaris on different hardware. In the cloud, I can run an "infinite" amount of these in parallel without any extra cost. On my own server, I would have to run each virtual server sequentially, since I have only one server. (MariaDB does it this way - and in total it takes about 15 hours before the last batch finishes.)
  6. Not only can I run things in parallel, I can choose how powerful server instances I want to launch each time. On Amazon I can pay 0.02 USD/hour for a small server with 1 CPU, or 17x more for a server with 4 CPUs that gets the job done in less than half the time. Depending on the situation, paying 17x more is worth the time saved.

Alternative 2a: Rackspace cloud

We of course all know about the Amazon cloud, but I wanted to compare it to something. Since Rackspace sponsors both Drizzle and OpenStack - and their service is based on OpenStack - I would have preferred to spend my money here. (Amazon is not open source, but the open source project OpenStack provides an EC2 compatibility layer for its own HTTP API.)

Pro: Cheaper plus more granularity in different options. The latter saves you money since you can pick precisely the kind of instance you need and not pay any extra.

Equal: The original Amazon S3 based "instance store" is difficult to manage. Essentially you can launch servers and once you stop them they disappear. To save any data between runs is complex. Rackspace Cloud Servers use traditional shared storage, can be stopped and rebooted without loosing your data. This is much, much better. However, today Amazon also provides this, it is called EBS (for Elastic Block Storage).

Con: At the time I started this, Rackspace only had a Web GUI for launching servers, but no programmable/scriptable REST API. I suspected this would be problematic - and indeed I do use this feature now on Amazon. Note that as of last month, Rackspace does have a REST API: https://www.rackspace.com/cloud/cloud_hosting_products/servers/api/.

Alternative 2b: Amazon EC2 cloud

As you already know, this is the one I ended up choosing. In addition to Rackspace lacking the REST API at the time, I figured that "it is what everyone else is using" was a powerful pro-amazon argument too.

Indeed, for instance Ubuntu provides official images that you can just launch without spending any time on installation or configuration. They include cloud-friendly tweaks compared to the standard Ubuntu installation - for instance they are configured to use an APT repository inside Amazon's data center. This make running "apt-get upgrade" amazongly fast: the download takes literally less than a second. Then of course unpacking the debs and running the installation scripts takes its own time...

So what does it cost?

I now have 3 months worth of billing data from Amazon. So what does it really cost?

Well, the first month was less than 1 USD since I just used small t1.micro instances which in the beginning are free thanks to some Amazon promotion. January and February has cost me roughly 200 USD each. So in 2 months I've spent the same money I could have spent on the cheap server in Alternative 1b. There are 2 things to say about that: 1) It is money well spent as the builds are faster in the cloud than on a cheap single CPU server, and 2) I could have spent much less if I had optimized the money side of this.

There are several reasons why I spent much more money than I should have. The biggest culprit is bzr bug 367545 For some reason when you branch the MySQL sources from Launchpad, Bzr eats up 800+ megs of RAM (essentially the whole bzr repository is stored in memory). Considering this is essentially just a download operation, it's really ridiculous for Bzr to use that much memory. If your computer has less memory - like a cheap t1.micro instance has - then bzr gets killed and the branch operation fails.

I mentioned above that Amazon doesn't offer as much granularity when choosing instances as Rackspace. So for 64-bit platforms the next size after t1.micro is m1.large and costs 17x more! So this bzr bug probably now cost me around 380 dollars as it was the main reason I had to use the m1.large instances so much!

I was also rather careless in my usage of the EC2 servers. While a build system is an excellent use case for the cloud, my usage of it was not. Being on paternity leave I actually left the servers running idle for large amounts of time. I would code a little, start a build, go and take care of the kids, code a little more, go to sleep, come back 24 hours later to code a little... If I had done this as a work project my use of EC2 would probably had been much more efficient as I would have shutdown the servers when not needed.

Towards the end I implemented a workaround to the bzr bug, which allowed me to use t1.micro instances again - I started keeping around a tar file with a shared mysql repository in it. It also cut the build time by more than half (from 2 hours to 50 minutes) as prior to this the time for bzr to download a new repository from launchpad was taking most of the time! But even if it was now possible to use the t1.micro instances, I decided to use the m1.large instance type anyway, as builds would complete 3-4 times faster. During development, this allowed me to progress faster as the wait time was reduced.

In a sense, you might say that the cloud allows you to flexibly choose whether to spend time to save money, or spend money to save time :-)

Updated Feb 23: Found out from Jay Pipes that OpenStack does provide an EC2 compatibility layer in its HTTP API, so I removed a reference to Eucalyptus and mention OpenStack instead.

You have me convinced that the best option is to buy a server.
Besides why buying a server for a small project like this. Seems
better to buy a cheap desktop (around 1k$), buy a fixed IP address
from your internet operator (costs me 4-5$/month).

You can also use the desktop as your local desktop if you virtualize
the work you've done.

Hard to beat that cost except possibly this winter when electricity
bills in Nordic countries have skyrocketed :)

Btw: Interesting one has to pass fairly difficult questions to
pass as human here :) Have to know what a beetle is and what
magenta is. Not sure this is commonplace knowledge for
non-english speakers.

I could of course have just used virtual machines on my own laptop too, but somehow I just wanted to park this project in a separate "space" - so either buy a server or rent a virtual one.

In the end, the main reason to use a cloud service for this kind of project is the flexibility and freedom of choice: When I want to build packages for 20 different Linux variants, I can run 20 servers in parallel for 1 hour instead of 1 server for 20 hours, for the same cost. (Which is significant saving of time in the development phase when something breaks, you fix it and then retry, and retry, and retry.)

If you just need server time, then Amazon is not cheap at all. Rackspace is better, but even then having your own server will be cheaper in the long run. In any case, my usage of EC2 was very wasteful during these months, I'd leave servers idle for a day just because I didn't want to shut them down for every family interruption there was.

This is kinda covered, I didn't specify what software I would have run on the server to buy. Of course, in pracice I would have then just copied the MariaDB system which is based on KVM plus shell script.

But the nice thing about OpenStack (and Eucalyptus) is that now that I did something in the public cloud, my build system is still usable on a private server with these tools.

OpenStack in particular was the reason to consider Rackspace cloud. Too bad the REST API didn't exist in November when I started this. Now all I can say is, I hope OpenStack will maintain an EC2 compatibility API, and not just diverge into a competing API.

OpenStack supports both the Rackspace Cloud Servers REST API (called the OpenStack API) and the Amazon EC2 API, and has for some time. Neither are going away. However, since the community actually has some say in the direction and substance of the OpenStack API, I imagine that is the API that will have the fastest improvements to it :)


Cat Bird (not verified)

Wed, 2011-03-02 06:01

I would use a Cloud only if the information wasn't sensitive (could be written on a postcard).

Since MySQL does not have table encryption, I'd be hesitant storing anything on a cloud. Also you can be sure bonelamb sekurty gets a copy of your database every month, including a list of your users and their email addresses and what they looked at (if you are tracking that). No warant needed. EBay hands over their data every month and no one is the wiser. Not only of the purchasers/sellers on ebay, but also what people are looking for. Welcome to big brodder. :)

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube