Comparing open source GIS implementations

In my quest to understand spatial GIS functionality, I have come to the ultimate goal: evaluation the actual database products themselves.

PostgreSQL / PostGIS

PostGIS is a variant of PostgreSQL with spatial extensions. The main reason for maintaining the GIS feature set outside of PostgreSQL proper seems to be licensing: the spatial extensions are ~~LGPL~~ GPL licensed.

PostGIS is widely recognized as the most mature and feature-rich GIS implementation for SQL databases (and perhaps any database), matched only by the costly Spatial extension for the Oracle Enterprise database. (See comparisons in the links below.)

For instance PostGIS supports both Geometry (x, y) and Geography (lat, long) types and functions. The functions take into account the specified Spatial Referencing System Identifier (SRID) and calculate distances using the correct projections.

Updated 2012-08-13: See comments section, especially from reader "Regina", with information about PostgreSQL 9.1 improvements wrt the criticism below, which is based on PostgreSQL 8.4 / PostGIS 1.5.

The biggest disappointment in PostGIS turns out to be that the GiST R-tree index only supports bounding rectangle operations. 1 Similarly, since a lot of the GIS functionality is implemented using PL/pgSQL it must be taken into use by executing various *.sql files that are supplied. This again leads to maintenance problems when upgrading to newer versions: after upgrading the PostGIS binary, one must again execute additional *.sql as part of the upgrade, or worse, perform a full DUMP and RESTORE. (In PostGIS defense, it should be noted that doing a full DUMP and RESTORE is common practice with normal PostgreSQL upgrades too.) To round it all off, I also failed to find Postgis in the common Centos repositories. The manual only instructs you to compile it from source - big minus in user friendliness! I eventually found the OpenGeo project which provides a yum repository that includes opengeo-postgis rpm packages: https://suite.opengeo.org/docs/usermanual/installation/linux/postgis-centos.html (Update: Again, commenter "Regina" informs that the official PostgreSQL yum reporitories do in fact contain also PostGIS binaries.) All this makes PostGIS appear as a bolted on hack on top of standard PostgreSQL, which perhaps is not an entirely wrong perception. Compared with MySQL, then, PostGIS is no doubt more featureful but still a bit of a disappointment due to complexity of use. The lack of native support in the GiST R-tree index for other than bounding rectangle searches seems annoying and the resulting workarounds clearly are against the declarative spirit of SQL. Since it is possible for anyone to do simple bounding box operations with any RDBMS simply by defining and indexing 4 columns [xmin, ymin, xmax, ymax] (or, perhaps using a quad-tree approach) and querying these with normal SQL, a compelling reason to use PostGIS is certainly lost there. On the other hand the support for using correct projections with supplying the SRID, and support for Geography types, are unique PostgreSQL strengths not provided in other open source alternatives (except SQLite, which is based on the same libraries). For more information: https://postgis.refractions.net/documentation/ https://postgis.refractions.net/documentation/manual-2.0/using_postgis_dbmanagement.html#id2673440 https://www.bostongis.com/PrinterFriendly.aspx?content_name=sqlserver2008_postgis_mysql_compare https://www.bostongis.com/PrinterFriendly.aspx?content_name=sqlserver2008r2_oracle11gr2_postgis15_compare

MySQL

MySQL implements a subset of the OpenGIS "Simple feature access" specification. A multitude of Geometry types is supported, but only on a 2D plane. The Geography type systems (lat, long) are not implemented. Also within the Geometry implementation, MySQL has only supported functionality based on Minimum Bounding Rectangle operations. Thus, two objects that do not actually interstect, might still be returned as intersecting by the MBRIntersects() function if their bounding rectangles would intersect. This has been the primary reason for criticizing MySQL's spatial implementation as naive. It seems that as of 5.6.1 (beta) MySQL implements proper OpenGIS functions in this area. (https://dev.mysql.com/doc/refman/5.6/en/functions-for-testing-spatial-relations-between-geometric-objects.html#functions-that-test-spatial-relationships-between-geometries) MariaDB also implemented these functions independently a while ago in version 5.3.3. A nice webcast of using these functions is available at https://blog.mariadb.org/screencast-mariadb-gis-demo/ This is a significant improvement in terms of MySQL spatial functionality. The other key drawback in MySQL's spatial functionality is the lack of spatial indexes for InnoDB tables. While all storage engines support Geometry column types, and the spatial functions can of course operate on those types regardless of the underlying engine, R-tree indexing capability was only developed for MyISAM. This is unfortunately still the case for MySQL 5.6 and MariaDB 5.3. Finally, MySQL ignores the SRID if specified and performs all calclulations using Euclidean geometry. For many applications this can be considered a minor issue, but it should be noted the error compared to real world distances does get quite large towards the poles. For some applications then, this inaccuracy might be a showstopper. In practical terms, while you might not care about the accurace as such, it might be a showstopper if your application needs to interact or consume data with another system that is already in a more accurate projections (such as the WGS84 Mercator projection) and you need to use the same projection to remain compatible. (OpenStreetMaps and our Nokia Maps use this projection, and I think Google Maps uses something similar but more complicated.) As such, the lack of R-tree support for InnoDB tables could still be considered a severe weakness for most GIS usage with MySQL. Even if an organization were to exceptionally allow use of MyISAM tables for GIS, the mixing of storage engines at minimum adds cognitive overhead, and for anything but a read-only workload the lack of transactions and crash-safety is very problematic indeed. A variation - still more complex, but safer - of the above would be to use InnoDB tables as primary storage for all data, including geometric types, and then maintain a copy of that data in a MyISAM table. This could possibly be automated with database triggers, but one still has to be mindful about MyISAM's poor read-write performance. Another variation would be to store the geometry types into MyISAM tables (with indexes) and link them to the main InnoDB tables with a foreign key relationship. Yet another approach to use GIS with MySQL would be to keep using InnoDB tables, and add some application level workaround to compensate for lack of R-tree indexes. An obvious idea then is to encode the bounding rectancle of geometric objects as a quad-tree, which could be stored in a normal B-tree index. This could to some extent even be done with database triggers. In the searching phase the application would then have to first search the B-tree index and then filter the result set further by comparing shapes one-by-one with functions like st_intersects(). Given that the PostGIS R-tree index only works on bounding rectangles too, this solution is comparably equal, even if more "hand made". The one thing that clearly stands out in favor of MySQL is the user friendliness. The Geometric types and functions are just there, ready to be used. They don't need to be downloaded from a separate place and manually installed, and there's no additional gymnastics to do at every upgrade. Given that PostGIS implementation isn't perfect either, this benefit can be considered MySQL's strong point. In conclusion then, it seems that if accurate positioning calculations are needed, then PostGIS is without competition in the open source world. On the other hand, with the recent addition to MySQL of true spatial functions other than just bounding rectangles, the user friendliness of MySQL is compelling when Euclidean 2D geometrics is sufficient. For the future, I spoke with Holyfoot from the MariaDB team who told me MariaDB is working on supporting different projections. I do not have information on anyone working on an InnoDB implementation of the R-tree index, but I wouldn't be surprised if Oracle is secretly doing just that. (This is based on knowing they recently released another MyISAM-only feature, the full text indexing, for InnoDB and the general trend to discontinue MyISAM support and focus on InnoDB.) For more information: https://dev.mysql.com/doc/refman/5.6/en/spatial-extensions.html https://blog.mariadb.org/screencast-mariadb-gis-demo/ https://kb.askmonty.org/en/gis-features-in-533/

SQLite / Spatialite

Spatialite is a variant of the SQLite database that provides GIS features. It uses the GEOS, PROJ.4, etc... libraries that are also used in PostGIS, and hence the resulting featureset is similar. In particular, spatialite supports different geometric reference systems (the SRID values) and correctly handles projections into each reference system. Only Geometry types (2D x,y operations) are supported, the Geography type operations are not. The R-tree index only stores bounding rectangles. What's more, while the R-tree is maintained automatically, it is not actually included in query plans by the SQLite optimizer. Instead, the user has to explicitly query a table that is the R-tree index, and include this into the actual query as a sub-query or join. Examples of rewriting a normal SQL query to one explicitly using the R-tree index are given in the tutorial below. While it is not useful to compare SQLite to client-server databases like PostgreSQL and MySQL, the advanced features in SQLite can indirectly impact the choice of Postgres vs MySQL: Imagine an application that uses SQLite/spatialite on the client side, for instance to provide offline access to a subset of a GIS database. In this case it would be wise to use PostGIS for the server side implementation, since it would be guaranteed to produce identical results as spatialite. Otoh MySQL's Euclidean-only geometrics would introduce a risk of producing different results for spatial operations compared to the client side spatialite operations. The lack of support for Geography projections (lat, long) is a clear minus, since in practice the WGS84 projection is used for significant data sets. Read more: Extensive tutorial: https://www.gaia-gis.it/gaia-sins/spatialite-tutorial-2.3.1.html Quick tutorial: https://underdark.wordpress.com/2010/06/23/spatialite-quick-start/

BerkeleyDB

As it is possible to use SQLite as a frontend to BerkeleyDB, it is also possible to use the Spatialite extensions as described above. There is no further GIS support beyond this. For more info: https://docs.oracle.com/cd/E17076_02/html/installation/build_unix_sql.html

MongoDB

MongoDB markets itself as having spatial extensions, which were sponsored by Foursquare, no less. It turns out this feature boils down to the ability to index 2D points, and querying records within a distance of a given point. (Ie. find friends or cafees within a 5km radius.) It seems MongoDB's implementation is similar to using quad-tree encoding over a B-tree, but instead they encode locations as GeoHashes. For instance the hash "ud9wrc8" is the location of Helsinki: https://geohash.org/ud9wrc8 There is no support for other geometric objects than points, and thus also no support for related GIS functions such as intersects(). As a curiosity, the implementation also cannot wrap around the +180 longitude to -180. As pointed out in the PostGIS section above, this feature is essentially doable with any SQL database without using any GIS functions. Otoh for a NoSQL database it is no doubt a helpful enabler. For more information: https://www.mongodb.org/display/DOCS/Geospatial+Indexing/ The PostGIS documentation explains GeoHash in more detail: https://postgis.refractions.net/documentation/manual-2.0/ST_GeoHash.html

Solr / Lucene

Solr also provides functionality to filter search results by distance from a given point. This can also be combined with other filters, for instance proximity can be used to boost search results, while not absolutely excluding remotely located points if they match the other search criteria. There is no support for other geometric types than 2D points, nor the related GIS functions. The feature provided by Solr seems like a perfect fit for Point-of-Interest type of applications, since it combines the ability to find a location - in particular, a point - with the ability to do text searches. A nice and short tutorial of Solr capabilities are given at: https://derickrethans.nl/spatial-indexes-solr.html For more info: https://wiki.apache.org/solr/SpatialSearch/

Libraries: GEOS

GEOS is a well known C++ library that implements the OpenGIS Simple Features Specification, ie it provides both the Geometry types and related functions. It is a C++ port of the pre-existing Java Topology Suite. GEOS is used in PostGIS, whereas MySQL implements its own functionality. There is a family of related libraries provided by the OSGEO project, such as PROJ.4 used to transform between different projections. For more info: https://trac.osgeo.org/geos/

JTS

Java Topology Suite is a Java Library that implements the OpenGIS Simple Features Specification. For more info: https://tsusiatsoftware.net/jts/main.html

1 https://postgis.refractions.net/documentation/manual-2.0/using_postgis_…] This is a bit annoying, but in practice a developer would then have to construct a WHERE clause that first contains a ST_DWithin() against a bounding rectangle and then add a logical AND using the appropriate GIS function against the actual object, which may be something else than a rectangle. I.e:

  SELECT ... WHERE 
  ST_DWithin(shape_column,  
                     ST_MakeEnvelope(90900, 190900, 100100, 200100,312), 100)
  AND
  ST_Distance(shape_column, 
                       ST_GeomFromText('POINT(100000 200000)', 312)) 

In the above, the first part of the WHERE clause (Within) would select anything inside the given rectangle, and this can be done with an index scan. The second part (Distance) would then be used to filter the result set further (in this case, including objects within a circular radius instead of a rectangle).

The manual also mentions that the PostgreSQL query planner will often fail to actually use the index at all, why it may be necessary to force the using of an index or alternatively modify the cost parameters of the query planner.

Finally, the use of PostGIS extensions seems to come with various SQL overhead that may be the norm in PostgreSQL and Oracle land, but appears as quite bad usability when coming from a MySQL or NoSQL background. Consider for instance that only in recent development PostGIS versions is the creation of Geometry columns actually done in CREATE TABLE syntax, but instead a separate AddGeometryColumn() procedure has until now been called.https://postgis.refractions.net/documentation/manual-2.0/AddGeometryColumn.html

Log in to post comments
49815 views

Rectangles being the

Rectangles being the fundamental spatial object for the R-tree, PostGIS merely leaks this R-tree property all the way to the application developer. It could be a case with complex spatial queries that an explicitly-done "derectanglization" at a single point, i.e. right before delivering the result set, is much more efficient than query optimizer filtering each index result. Of course, the same should be achievable with spatial-aware query optimizer too, or at least it could be hintable.

I am not sure I agree with "Since it is possible for anyone to do simple bounding box operations with any RDBMS simply by defining and indexing 4 columns [xmin, ymin, xmax, ymax] (or, perhaps using a quad-tree approach) and querying these with normal SQL, a compelling reason to use PostGIS is certainly lost there." I think that the high performance of the spatial operations is the most compelling reason to use any spatial extensions, even if rectangles is the only kind object being indexed.

How can the performance be

How can the performance be higher than:

WHERE x <= xmin AND x => xmax AND y >= ymin AND y >= ymax

...assuming both alternatives are properly indexed?

For range queries, if you

For range queries, if you have a multiple-column index, it will help you with the 1st dimension only and the index scan will be quite large to select the answer by the remaining dimensions. If you have separate indexes for each dimension, then index merge might help a bit, but the update becomes more expensive. Whereas the R-tree would answer the range query with a single dive, very much like a B-tree would answer a 1D range query.

Now for NN-queries and the like, I don't think there's any way to make these fast without spatial indexing.

Ok, you're right if the first

Ok, you're right if the first dimension is a big range, then you'll end up jumping into multiple smaller ranges for the second dimension. I was thinkin too much of a point, where there's only 1 match also for the second dim.

Quality

Very impressive. Great article. Thanks

Je vous présente mon site univ wordpress
Ce site m'appartient en plus de tous les autres, il est dédié Thème Wordpress, et que celui-ci apporte de très très nombreuses ressources intéressantes que j'aimerais partager avec vous.

Sphinx has support for

Sphinx has support for searching within a distance of a point. I'm not intimately familiar with it, or how it differs from MongoDB's functionality, but I have always mentally grouped the two together in that arena.

I didn't look at Sphinx, but

I didn't look at Sphinx, but my uninformed assumption is that it provides the same functionality as Solr/Lucene. MongoDB is kind of the same, but as far as I know you can't do full text search with MongoDB, you can just combine a distance search with filtering against whatever other field values you happen to have there. (If I'm simply uninformed and MongoDB has full text search, then it is also the same.)

Couple of erroneous statements

1) PostGIS is GPL not LGPL (though it does build on parts like GEOS that are LGPL)
2) SELECT ... WHERE
ST_DWithin(shape_column,
ST_MakeEnvelope(90900, 190900, 100100, 200100,312), 100)

This is sufficient since around PostGIS 1.3 (no need to add an ST_Distance). The ST_DWithin already has a bounding box and a real short-circuit distance check built in. So though R-Tree is just bbox based, ST_DWithin wraps both.

3)Most users using latest PostgreSQL on Centos et.c I know use the postgresql official Yum repository. WHICH DOES include the latest PostGIS http://yum.postgresql.org/

4) PostgreSQL 9.1+ has the

CREATE EXTENSION postgis;

which is all you need to enable postgis in a database.

Hi Regina. Thanks a lot for

Hi Regina. Thanks a lot for your hints. These are very helpful!

Too bad the PostgreSQL repository is not advocated from postgis.org documentation. If I had found all this information easily, the first impression would have been much better! (I still didn't give up on postgis, just saying...)

Hingo, Good point. You aren't

Hingo,

Good point. You aren't the first to complain and I plan to do something about that soon. Here is another link you might be interested in of someone with similar issues, and Paul's pseudo rebuttal.

http://blog.cleverelephant.ca/2012/08/postgis-apologia.html

PostGIS devs have our ears wide open and will try to make things easier on folk. I forget what it was like starting out on PostGIS and difficulties that I now take for granted. It's good to hear this input.

It is in docs now

I have it linked in the 2.0 and 2.1 docs now and will have maintainers update their list.

http://www.postgis.org/documentation/manual-svn/postgis_installation.ht…

It's amazing how much of an audience you lose by the simplest of issues.

Thanks for your input.

Thank you! 131 Points in my

Thank you! 131 million Points in my PostGIS database now, and counting :-)

Btw, it's a common issue with FOSS manuals that they focus too much on the activity of compiling the source code. Even when the manual might say something about packages existing, the chapter on installation usually begins with explaining the process of building from source. This is poor prioritization, since most of your end users are not going o do that.

Couldn't agree with you

Couldn't agree with you more.

I think I'm the only PostGIS developer that thinks that :)
probably because I'm the only Windows developer in the group so I'm used to dealing
with a lot of users where compiling is a scary concept.

Unfortunately I wasn't able

Unfortunately I wasn't able to get "CREATE EXTENSION postgis" to work with the 9.1 binaries from yum.postgresql.org. After installing postgresql91-server and postgis91, I got this error:

could not open extension control file "/usr/pgsql-9.1/share/extension/postgis.control"

Unfortunately RTFM wasn't of any help this time either:

grep 'CREATE EXTENSION' /usr/share/doc/postgis91-1.5.5/*

So in a way I stand by my original conclusion that installing postgis is unnecesarily hard. Compiling from source isn't quite the same as ready for general use.

I forgot to mention CREATE

I forgot to mention

CREATE EXTENSION only works with PostGIS 2+ and it appears you have installed PostGIS 1.5.5. You need both PostGIS 2+ and 9.1+.

Yum has both 1.5 and 2.0.
Read:
http://people.planetpostgresql.org/devrim/index.php?/archives/64-PostGI…

I admit it could be easier. One of the other reasons aside from the the GPL issue why PostGIS is not packaged with PostgreSQL is that we have a different release cycle and PostGIS devs really liked that power (to control our own destiny if you will). This allows us to have many different versions of PostGIS on a given version of PostgreSQL which for an experienced PostGIS user is a great benefit but at the disadvantage of confusing the heck out of new comers (since e.g. oracle, SQL Server, MySQL one version for each version of the DB) :).

It's even more confusing in that some distros don't follow this policy. We package the windows build and offer PostGIS 2.0 on PostgreSQL 8.4-9.1 (for 32 and 64-bit) and PostGIS 1.5 on PostgreSLQ 9.4-9.1 (just for 32-bit) . OpenSuse GIS packager does similar I believe.

Thanks. Following

Thanks. Following instructions in that link it worked! This was good, because the Postgres 8.4 based install I got from opengeo actually core dumped when inserting Geography type of points. With 9.1 installation it works fine.

Thanks for your help!

Slight correction

Regarding Your example, slight correction to my last statement.

This is sufficient if you want to do a point radius check and should give the same result as your example
for PostGIS > 1.3+

SELECT ... WHERE
ST_DWithin(shape_column,
ST_GeomFromText('POINT(100000 200000)',312), 100);

This works because ST_DWithin contains both a bounding box check (to utilize available spatial index) and a short-circuit distance check which is more efficient than a regular distance check since it stops checking once its found any part within the required distance.

Pre-vote
2 years 9 months ago
The range query could be…
3 years ago
Interesting approach.
I have…
3 years 1 month ago
Try this one.
3 years 7 months ago
4 round trips
4 years 6 months ago
I tried to understand the…
4 years 6 months ago
Paxos in Cassandra
4 years 7 months ago
Thanks for sharing the pain…
4 years 7 months ago
Replacing the locks
5 years 4 months ago
10% bound on space-amp
5 years 7 months ago

OpenLife.cc

Open Source and Databases