4 performance fixes to MySQL on large servers

Yesterday I posted results from some MySQL benchmarks I had been doing on a large server. In this post I'd like to list 4 important fixes that were done to avoid bad performance:

  1. Linux swapping, NUMA: If you have more than 1 CPU socket, you will be using Non-Uniform Memory Allocation. Linux has a bad default strategy of allocating memory here, and it is easy to end up in a situation where you have many GB of free memory left, yet Linux is already swapping. Yes, seeing that happen will be very confusing! This problem has been investigated and explained brilliantly by Jeremy Cole. The solution is a one liner given at the end of his article.
  2. Swappiness: This is not a bug, just easy to forget, or easy to assume someone else already set this. By default, Linux will think it is a good idea to swap out some processes (especially MySQL, that consumes a lot of memory) in favor of the filesystem cache, so that your disk access becomes faster. This is a good idea in many cases, but not when you're running a database. So you should set vm.swappiness=0 in /etc/sysctl.conf. This was covered in Yoshinori's tutorial at the MySQL conference.
  3. Swappiness, Linux bug: On kernels prior to 2.6.28 (such as CentOS 5...) Linux will still swap out MySQL regardless of vm.swappiness, because the algorithm for balancing swapping and filesystem cache is stupid. The fix is to upgrade your kernel. A workaround is again offered by Yoshinori to those who can't upgrade: A perl script you can run from cron.
  4. InnoDB contention on 100+ GB buffer pool: It seems that InnoDB performance will severely degrade under some conditions if your buffer pool is 100 GB or more (exact limit unknown). This was originally observed by Vadim. The contention is possibly triggered when your system has low latencies when writing to disk, or when using ext3. A workaround is to make the buffer pool smaller - 64 GB has been verified to work well.

I'd like to thank Jeremy Cole, Yoshinori Matsunobu and Vadim Tkachenko for sharing their experiences and making it easy to fix what would otherwise have been very difficult problems.

No, I have exactly the same stupid question :-)

Reading between the lines of Jeremy's post, it seems the Linux devs think that --interleave has worse performance than the default "allocate all memory from your own NUMA node". But Jeremy's benchmarks do not show any difference. Even so, I totally agree with Jeremy's point that even if you do allocate all of the memory from one NUMA node, then swapping is the worst solution to think of, at least then Linux should continue to allocate memory from the other available NUMA nodes.

Anonymous friend (not verified)

Mon, 2014-01-27 22:02

running with --interleave might cause performance issues due to node NUMA distances

Many people say it might (although none has been observed) otoh swapping MySQL to disk will certainly cause performance issues. The current Linux behavior is simply not defensible.

Add new comment

The content of this field is kept private and will not be shown publicly. Cookie & Privacy Policy
  • No HTML tags allowed.
  • External and mailto links in content links have an icon.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
  • Use [fn]...[/fn] (or <fn>...</fn>) to insert automatically numbered footnotes.
  • Each email address will be obfuscated in a human readable fashion or, if JavaScript is enabled, replaced with a spam resistent clickable link. Email addresses will get the default web form unless specified. If replacement text (a persons name) is required a webform is also required. Separate each part with the "|" pipe symbol. Replace spaces in names with "_".
About the bookAbout this siteAcademicAmazonBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube