Hand holding PostgreSQL on a simple query

With MySQL it was for a long time the case that a lot of sub queries would actually perform poorly, because of poor execution plans. (This is no longer the case in MariaDB 5.5 or the upcoming MySQL 5.6.) Because of this, any MySQL DBA knows the rule of thumb that sub-queries should basically be avoided and you can usually get the same result by using JOINs instead.

I've now learned why PostgreSQL DBAs like sub queries so much. PostgreSQL - being the most advanced open source database - apparently does the exact opposite optimizations as MySQL: it requires you to rewrite simple queries into complex subqueries to get what you want. (Update: Mark Callaghan points out that MySQL - while it does create indexes automatically for foreign keys - actually has the same problems with the query plan as Postgres has in this post. See comments for details.)

I managed to greate a PostGIS database with a Location table and a Point table. The Point table has a foreign key, LocationId, which references records in the Location table. This is a 1:n relationship. In other words, for each Location (street address, zip code, you know) there can be several points.

The Point table has 358 million records. So if a query doesn't use indexes properly, you will notice. This is what the table looks like:

gc=# SELECT *, st_astext("p") FROM "Point" LIMIT 10;
    id     | MatchLevel |        p         | LocationId |        st_astext         
-----------+------------+------------------+------------+--------------------------
 253432873 | street     | 01010000...04440 |    6544580 | POINT(-74.06805 41.6267)
 253432897 | street     | 01010000...04440 |    6544580 | POINT(-74.06795 41.6267)
 253432918 | street     | 01010000...04440 |    6544580 | POINT(-74.06785 41.6267)

Ok, so I wanted to find out how many points belong to the location with id 1:

rgc=# SELECT *, st_astext("p") FROM "Point" ORDER BY "LocationId" LIMIT 10;
^CCancel request sent
ERROR:  canceling statement due to user request
rgc=# EXPLAIN SELECT *, st_astext("p") FROM "Point" ORDER BY "LocationId" LIMIT 10;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit  (cost=15688004.97..15688005.00 rows=10 width=145)
   ->  Sort  (cost=15688004.97..16582377.05 rows=357748830 width=145)
         Sort Key: "LocationId"
         ->  Seq Scan on "Point"  (cost=0.00..7957181.38 rows=357748830 width=145)
(4 rows)

Query takes really long... Ok, EXPLAIN says it was going to do a full table scan. WTF?

Turns out, PostgreSQL doesn't actually create indexes automatically for a foreign key. You have to do the CREATE INDEX statement separately.

a) Does the word "KEY" in "FOREIGN KEY" mean nothing to you?

b) Was there ever - ever - a case where a foreign key doesn't need to be indexed? Ever? Especially when I use ON UPDATE CASCADE?

Anyway, I add the index and get my results.

rgc=# CREATE INDEX "fkey_locationid_ix" ON "Point" ("LocationId");
CREATE INDEX
rgc=# 
rgc=# EXPLAIN SELECT *, st_astext("p") FROM "Point" ORDER BY "LocationId" LIMIT 10;
                                                QUERY PLAN 
----------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4.37 rows=10 width=145)
   ->  Index Scan using fkey_locationid_ix on "Point"  (cost=0.00..156298416.64 rows=357797696 width=145)
(2 rows)

rgc=# SELECT "LocationId", COUNT("LocationId") FROM "Point" GROUP BY "LocationId" 
      ORDER BY "LocationId" LIMIT 10;
 LocationId | count 
------------+-------
          1 |  8779
          2 |   318

Ok. Now I also wanted to know how many points belong to LocationId 1 and had a certain MatchLevel. Easy enough:

rgc=# SELECT "LocationId", "MatchLevel", COUNT("LocationId") FROM "Point" 
      GROUP BY "LocationId", "MatchLevel" ORDER BY "LocationId", "MatchLevel" LIMIT 100;
...

rgc=# EXPLAIN SELECT "LocationId", "MatchLevel", COUNT("LocationId") FROM "Point" 
      GROUP BY "LocationId", "MatchLevel" ORDER BY "LocationId", "MatchLevel" LIMIT 100;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Limit  (cost=10009754756.34..10009754756.59 rows=100 width=13)
   ->  Sort  (cost=10009754756.34..10009755169.85 rows=165404 width=13)
         Sort Key: "LocationId", "MatchLevel"
         ->  HashAggregate  (cost=10009746780.68..10009748434.72 rows=165404 width=13)
               ->  Seq Scan on "Point"  (cost=10000000000.00..10007063297.96 rows=357797696 width=13)
(5 rows)

Now what? This should be an easy case for a database. Just use the same index on LocationID, then filter me the results with the given MatchLevel. But it doesn't happen. Checking EXPLAIN show a seq scan again (ie full table scan).

Ok, that's just silly. So maybe I can FORCE INDEX (or HINT, if you are Oracle DBA). Nope, they don't have that in Postgres. You can just generally enable and disable use of indexes vs sequential scans:

rgc=# set enable_seqscan=false;

But it gives the same results. Of course. If Postgres wasn't smart enough to use an index the first time, saying "pretty please" is not gonna make a difference here.

So I have to really handhold Postgres and create a sequence of sub queries to explicitly explain that hey, you can first get these records via this indexed column, and then filter out (or group by, which the example is about) the MatchLevel stuff on the next level.

rgc=# SELECT "LocationId", "MatchLevel", COUNT(*) FROM (SELECT "LocationId", "MatchLevel" FROM "Point"
 ORDER BY "LocationId" LIMIT 1000000) AS "Sub" GROUP BY "LocationId", "MatchLevel" ORDER BY "LocationId";
 LocationId | MatchLevel  | count  
------------+-------------+--------
          1 | street      |   8779
          2 | street      |    318
...

Another simple query, which I won't explain here, ended up being a 4 level sub query. Pretty ugly.

But hey, Postgres is really good at executing these complex and advanced queries, that's for sure.

Anonymous friend (not verified)

Fri, 2012-09-07 14:47

Your last query uses "ORDER BY LocationId" while the previous one "ORDER BY LocationId, MatchLevel". This is a big difference. I suppose an index on (LocationId, MatchLevel) would solve the issue.

Of course Pg would like it if I added that index, but that's still no excuse to ignore the next best index, which is still much more efficient to use than trying to sort 358M rows on the fly. In fact, my data turns out to be a bit pathetic, so that the MatchLevel column actually has the same value for any given LocationId. Therefore an index on (LocationId, MatchLevel) would be completely redundant and that's why I don't want to add it.

Ok, that's a valid point. The 2 column index would still waste more space, but yes, I could drop the 1 column index then.

The database of course doesn't know the second column values end up being all equal (per group) but for my purposes the work involved to sort or filter through 5 to 5000 rows per group is small enough to be acceptable. Which is exactly what happens in the subquery variant, which finishes in the blink of an eye. And this is something the database should know, based on the cardinality of the first index.

No, but there's still GROUP BY ... MatchLevel. This can of course be satisfied in various ways, only one of which involves sorting. But it still involves scanning through each record in order to group and count them.

I can't get MySQL 5.1 to do better than PG. It needs an index on (LocationId, MatchLevel) to avoid a full scan. To do something better it needs hash aggregation or an incremental sort (for example, sort by building a tree) so that the index scan on LocationId as soon as 100 distinct values on (LocationId, MatchLevel) have been read.

My test case:

rop table if exists point;
create table point (id int primary key auto_increment, location int, match_level int);
alter table point add index lx(location);

show create table point\G

insert into point values (NULL, 0, 0);
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;

update point set location=id, match_level=id;

analyze table point;

select * from point limit 10;

show indexes from point;

EXPLAIN SELECT location, match_level, count(location)
FROM point
GROUP BY location, match_level
ORDER BY location, match_level
LIMIT 100\G

EXPLAIN SELECT location, match_level, count(location)
FROM point FORCE INDEX(lx)
GROUP BY location, match_level
ORDER BY location, match_level
LIMIT 100\G

alter table point add index lxm(location, match_level);
show indexes from point;

EXPLAIN SELECT location, match_level, count(location)
FROM point
GROUP BY location, match_level
ORDER BY location, match_level
LIMIT 100\G

Interesting. I get the same on MariaDB 5.5.

Is there any valid reason why this happens and that I'm just too blind to see? I mean, that it fails even with FORCE INDEX means that either me or MySQL is doing something really wrong.

Also when I remove the GROUP BY the result is still the same:

EXPLAIN SELECT * FROM point ORDER BY location, match_level LIMIT 100\G

Ah, I realize you have kind of answered my question already. To me the "incremental sort" or what I might call "nested sort" seems so obvious I was really surprised (in both cases) that it didn't exist. I mean the right query plan here would involve the step:

for each locationid, sort by match_level

That's essentially what I force with the horrible sub query.

I didn't spell it out in the post, but one "lesson learned" from this experience was that I now understand better why people are used to breaking their queries into sub queries. Of course, in Oracle world it is also a way to add parallelization opportunities (and maybe Postgres too, I don't know enough of that one).

Also, every time I learn more about the state of the art of RDBMS optimizers, I understand the point of NoSQL a little better :-)

"Also, every time I learn more about the state of the art of RDBMS optimizers, I understand the point of NoSQL a little better :-)"

This puzzles me.

Are you implying NoSQL would somehow be more efficient here? From what I've seen sofar what you can and can't do more efficiently with NoSQL depends to a large extent on the data structures you choose to store your data. But from this example, I don't see any obvious way to do it better than in a RDBMS. Do you have an example?

cheers,

Roland

I think I just understand people who like to query data with more explicit query languages, regardless of how data is stored. You shouldn't read more into it than that! E.g. in this case I could store the data just like in this example, and then use HANDLER or HandlerSocket or whatever to explicitly tell the database what index to read, when to sort, etc.

My first encounter with Hadoop usage, back when it pretty much didn't exist in Europe yet, was a person explaining that he much prefers Pig over SQL for precisely this reason: when you are running a 5 hour batch query, and 6 hours later you realize that the database must have chosen a different query plan today than it did yesterday, you'd kind of want to know that without wasting the 6 hours.

Of course, in my case here I didn't waste any time, I was just surprised that something that was an obvious query plan to me isn't actually implemented in PostgreSQL nor MySQL.

Add new comment

The content of this field is kept private and will not be shown publicly. Cookie & Privacy Policy
  • No HTML tags allowed.
  • External and mailto links in content links have an icon.
  • Lines and paragraphs break automatically.
  • Web page addresses and email addresses turn into links automatically.
  • Use [fn]...[/fn] (or <fn>...</fn>) to insert automatically numbered footnotes.
  • Each email address will be obfuscated in a human readable fashion or, if JavaScript is enabled, replaced with a spam resistent clickable link. Email addresses will get the default web form unless specified. If replacement text (a persons name) is required a webform is also required. Separate each part with the "|" pipe symbol. Replace spaces in names with "_".
About the bookAbout this siteAcademicAmazonBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube