Hand holding PostgreSQL on a simple query

With MySQL it was for a long time the case that a lot of sub queries would actually perform poorly, because of poor execution plans. (This is no longer the case in MariaDB 5.5 or the upcoming MySQL 5.6.) Because of this, any MySQL DBA knows the rule of thumb that sub-queries should basically be avoided and you can usually get the same result by using JOINs instead.

I've now learned why PostgreSQL DBAs like sub queries so much. PostgreSQL - being the most advanced open source database - apparently does the exact opposite optimizations as MySQL: it requires you to rewrite simple queries into complex subqueries to get what you want. (Update: Mark Callaghan points out that MySQL - while it does create indexes automatically for foreign keys - actually has the same problems with the query plan as Postgres has in this post. See comments for details.)

I managed to greate a PostGIS database with a Location table and a Point table. The Point table has a foreign key, LocationId, which references records in the Location table. This is a 1:n relationship. In other words, for each Location (street address, zip code, you know) there can be several points.

The Point table has 358 million records. So if a query doesn't use indexes properly, you will notice. This is what the table looks like:

gc=# SELECT *, st_astext("p") FROM "Point" LIMIT 10;
    id     | MatchLevel |        p         | LocationId |        st_astext         
-----------+------------+------------------+------------+--------------------------
 253432873 | street     | 01010000...04440 |    6544580 | POINT(-74.06805 41.6267)
 253432897 | street     | 01010000...04440 |    6544580 | POINT(-74.06795 41.6267)
 253432918 | street     | 01010000...04440 |    6544580 | POINT(-74.06785 41.6267)

Ok, so I wanted to find out how many points belong to the location with id 1:

rgc=# SELECT *, st_astext("p") FROM "Point" ORDER BY "LocationId" LIMIT 10;
^CCancel request sent
ERROR:  canceling statement due to user request
rgc=# EXPLAIN SELECT *, st_astext("p") FROM "Point" ORDER BY "LocationId" LIMIT 10;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Limit  (cost=15688004.97..15688005.00 rows=10 width=145)
   ->  Sort  (cost=15688004.97..16582377.05 rows=357748830 width=145)
         Sort Key: "LocationId"
         ->  Seq Scan on "Point"  (cost=0.00..7957181.38 rows=357748830 width=145)
(4 rows)

Query takes really long... Ok, EXPLAIN says it was going to do a full table scan. WTF?

Turns out, PostgreSQL doesn't actually create indexes automatically for a foreign key. You have to do the CREATE INDEX statement separately.

a) Does the word "KEY" in "FOREIGN KEY" mean nothing to you?

b) Was there ever - ever - a case where a foreign key doesn't need to be indexed? Ever? Especially when I use ON UPDATE CASCADE?

Anyway, I add the index and get my results.

rgc=# CREATE INDEX "fkey_locationid_ix" ON "Point" ("LocationId");
CREATE INDEX
rgc=# 
rgc=# EXPLAIN SELECT *, st_astext("p") FROM "Point" ORDER BY "LocationId" LIMIT 10;
                                                QUERY PLAN 
----------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4.37 rows=10 width=145)
   ->  Index Scan using fkey_locationid_ix on "Point"  (cost=0.00..156298416.64 rows=357797696 width=145)
(2 rows)

rgc=# SELECT "LocationId", COUNT("LocationId") FROM "Point" GROUP BY "LocationId" 
      ORDER BY "LocationId" LIMIT 10;
 LocationId | count 
------------+-------
          1 |  8779
          2 |   318

Ok. Now I also wanted to know how many points belong to LocationId 1 and had a certain MatchLevel. Easy enough:

rgc=# SELECT "LocationId", "MatchLevel", COUNT("LocationId") FROM "Point" 
      GROUP BY "LocationId", "MatchLevel" ORDER BY "LocationId", "MatchLevel" LIMIT 100;
...

rgc=# EXPLAIN SELECT "LocationId", "MatchLevel", COUNT("LocationId") FROM "Point" 
      GROUP BY "LocationId", "MatchLevel" ORDER BY "LocationId", "MatchLevel" LIMIT 100;
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Limit  (cost=10009754756.34..10009754756.59 rows=100 width=13)
   ->  Sort  (cost=10009754756.34..10009755169.85 rows=165404 width=13)
         Sort Key: "LocationId", "MatchLevel"
         ->  HashAggregate  (cost=10009746780.68..10009748434.72 rows=165404 width=13)
               ->  Seq Scan on "Point"  (cost=10000000000.00..10007063297.96 rows=357797696 width=13)
(5 rows)

Now what? This should be an easy case for a database. Just use the same index on LocationID, then filter me the results with the given MatchLevel. But it doesn't happen. Checking EXPLAIN show a seq scan again (ie full table scan).

Ok, that's just silly. So maybe I can FORCE INDEX (or HINT, if you are Oracle DBA). Nope, they don't have that in Postgres. You can just generally enable and disable use of indexes vs sequential scans:

rgc=# set enable_seqscan=false;

But it gives the same results. Of course. If Postgres wasn't smart enough to use an index the first time, saying "pretty please" is not gonna make a difference here.

So I have to really handhold Postgres and create a sequence of sub queries to explicitly explain that hey, you can first get these records via this indexed column, and then filter out (or group by, which the example is about) the MatchLevel stuff on the next level.

rgc=# SELECT "LocationId", "MatchLevel", COUNT(*) FROM (SELECT "LocationId", "MatchLevel" FROM "Point"
 ORDER BY "LocationId" LIMIT 1000000) AS "Sub" GROUP BY "LocationId", "MatchLevel" ORDER BY "LocationId";
 LocationId | MatchLevel  | count  
------------+-------------+--------
          1 | street      |   8779
          2 | street      |    318
...

Another simple query, which I won't explain here, ended up being a 4 level sub query. Pretty ugly.

But hey, Postgres is really good at executing these complex and advanced queries, that's for sure.

Log in to post comments
14719 views

Your last query uses "ORDER

Your last query uses "ORDER BY LocationId" while the previous one "ORDER BY LocationId, MatchLevel". This is a big difference. I suppose an index on (LocationId, MatchLevel) would solve the issue.

Log in to post comments

Of course Pg would like it if

Of course Pg would like it if I added that index, but that's still no excuse to ignore the next best index, which is still much more efficient to use than trying to sort 358M rows on the fly. In fact, my data turns out to be a bit pathetic, so that the MatchLevel column actually has the same value for any given LocationId. Therefore an index on (LocationId, MatchLevel) would be completely redundant and that's why I don't want to add it.

Log in to post comments

You can create that index

You can create that index instead of the original index. The database don't know that you have the same value in MatchLevel for given LocationId, so, to sort by this value it can't just use existing index.

Log in to post comments

Ok, that's a valid point. The

Ok, that's a valid point. The 2 column index would still waste more space, but yes, I could drop the 1 column index then.

The database of course doesn't know the second column values end up being all equal (per group) but for my purposes the work involved to sort or filter through 5 to 5000 rows per group is small enough to be acceptable. Which is exactly what happens in the subquery variant, which finishes in the blink of an eye. And this is something the database should know, based on the cardinality of the first index.

Log in to post comments

But your subquery variant

But your subquery variant doesn't do the same. It won't return the same result. It's faster because you don't sort by MatchLevel.

Log in to post comments

No, but there's still GROUP

No, but there's still GROUP BY ... MatchLevel. This can of course be satisfied in various ways, only one of which involves sorting. But it still involves scanning through each record in order to group and count them.

Log in to post comments

MySQL does the same thing

I can't get MySQL 5.1 to do better than PG. It needs an index on (LocationId, MatchLevel) to avoid a full scan. To do something better it needs hash aggregation or an incremental sort (for example, sort by building a tree) so that the index scan on LocationId as soon as 100 distinct values on (LocationId, MatchLevel) have been read.

My test case:

rop table if exists point;
create table point (id int primary key auto_increment, location int, match_level int);
alter table point add index lx(location);

show create table point\G

insert into point values (NULL, 0, 0);
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;
insert into point select NULL, 0, 0 from point;

update point set location=id, match_level=id;

analyze table point;

select * from point limit 10;

show indexes from point;

EXPLAIN SELECT location, match_level, count(location)
FROM point
GROUP BY location, match_level
ORDER BY location, match_level
LIMIT 100\G

EXPLAIN SELECT location, match_level, count(location)
FROM point FORCE INDEX(lx)
GROUP BY location, match_level
ORDER BY location, match_level
LIMIT 100\G

alter table point add index lxm(location, match_level);
show indexes from point;

EXPLAIN SELECT location, match_level, count(location)
FROM point
GROUP BY location, match_level
ORDER BY location, match_level
LIMIT 100\G

Log in to post comments

Interesting. I get the same

Interesting. I get the same on MariaDB 5.5.

Is there any valid reason why this happens and that I'm just too blind to see? I mean, that it fails even with FORCE INDEX means that either me or MySQL is doing something really wrong.

Also when I remove the GROUP BY the result is still the same:

EXPLAIN SELECT * FROM point ORDER BY location, match_level LIMIT 100\G

Log in to post comments

Ah, I realize you have kind

Ah, I realize you have kind of answered my question already. To me the "incremental sort" or what I might call "nested sort" seems so obvious I was really surprised (in both cases) that it didn't exist. I mean the right query plan here would involve the step:

for each locationid, sort by match_level

That's essentially what I force with the horrible sub query.

Log in to post comments

My memory is vague but I

My memory is vague but I thought there was a recent post from the MPAB crew that wound't make this query faster but was similar in spirit. This isn't a trivia feature request but I am sure that MySQL and PG could implement it -- given time and money.

Log in to post comments

I didn't spell it out in the

I didn't spell it out in the post, but one "lesson learned" from this experience was that I now understand better why people are used to breaking their queries into sub queries. Of course, in Oracle world it is also a way to add parallelization opportunities (and maybe Postgres too, I don't know enough of that one).

Also, every time I learn more about the state of the art of RDBMS optimizers, I understand the point of NoSQL a little better :-)

Log in to post comments

NoSQL

"Also, every time I learn more about the state of the art of RDBMS optimizers, I understand the point of NoSQL a little better :-)"

This puzzles me.

Are you implying NoSQL would somehow be more efficient here? From what I've seen sofar what you can and can't do more efficiently with NoSQL depends to a large extent on the data structures you choose to store your data. But from this example, I don't see any obvious way to do it better than in a RDBMS. Do you have an example?

cheers,

Roland

Log in to post comments

Ok - upon closer reading, my

Ok - upon closer reading, my guess is that you'd build on collection of location documents that would have a collection of points, right?
I'd still be interested to know how the query involving the matchlevel could be implemented faster, it seems you still have to scan them all to get the result.

Log in to post comments

I think I just understand

I think I just understand people who like to query data with more explicit query languages, regardless of how data is stored. You shouldn't read more into it than that! E.g. in this case I could store the data just like in this example, and then use HANDLER or HandlerSocket or whatever to explicitly tell the database what index to read, when to sort, etc.

My first encounter with Hadoop usage, back when it pretty much didn't exist in Europe yet, was a person explaining that he much prefers Pig over SQL for precisely this reason: when you are running a 5 hour batch query, and 6 hours later you realize that the database must have chosen a different query plan today than it did yesterday, you'd kind of want to know that without wasting the 6 hours.

Of course, in my case here I didn't waste any time, I was just surprised that something that was an obvious query plan to me isn't actually implemented in PostgreSQL nor MySQL.

Log in to post comments

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube

OpenLife.cc

Open Source and Databases