Monday, March 10, 2014

Spatial Data Structures: from R to PostGIS and back to R

Last friday I had a chat on twitter with ElCep and Joel G about the possibility to send R's spatial data structures to a PostGIS enabled database and retrieve them back to R.
Just right now, R can be used as a powerful GIS tool and the developement of spatial-oriented packages proceeds at a steady state .
On the other hand PostGIS is an unvaluable resource for storing and managing godzillions of vector and raster bits.

Coupling these two tools is like having a swiss knife for doing both statistics and GIS analyses when your datastests become bigger and bigger [1].

OK... after this cheesy introduction, let's start from the beginning and suppose we have just a point location called 'home' in x= 1247289, y=5805658. We would like to generate a buffer of 1 km around the point using PostGIS and then get the resulting polygon back in R

Although the buffering operation can be done both completely in R and in PostGIS, this approach shows the trick to interoperate between the two software using the Well Known Text encoding of geometries, which is an OGC standard [2]. WKT is the "lengua franca" between R and PostGIS.

So, let's get our hand dirty on it: first we push the point into R.
point = data.frame(x= 1247289, y= 5805658, name= 'home')
With the R package sp [3] we transform it in a spatialPointsDataFrame
require(sp)
coordinates(point) = c("lon","lat")
proj4string(point) = CRS("+init=epsg:3857")
With these operation we just trasform a dataframe in something different, a spatial structure, with a defined position on the earth surface using a Spherical Mercator Coordinates Reference System (CRS).
Now, to use this point inside PostGIS, we have to push it towards the database by the WKT encoding: to do so we'll use the functions in the rgeos package [4].
If you look at the documentation rgeos is a powerful GIS tool allowing calculations of buffers, intersections... Nothing surprising  here: the topological library GEOS [5], on which rgeos is built on, is one of the pillars for the PostGIS engine and for almost every other FOSS GIS.
require(rgeos)
wktPoint = writeWKT(point)
So now we have our point geometry ready for PostGIS. The trick is to use the ST_GeomFromText PostGIS' function that takes a WKT
as input and transform it in the internal encoding of the RDBMS. By the way, the encoding of our point in the DBMS is "0101000020110F000000000000390833410000008096255641", and so we just build the query as:
query = paste0("SELECT ST_Buffer(ST_GeomFromText('", wktPoint ,"', 3857),1000)")

Nothing fancy in this spatial query: it just takes our point and builds a 1 km buffer around it. Pay attention to the double and single quotes: the WKT is a string of text and so it must be single quoted in the SQL.

If you try to fire the query from inside R (we will se how in a minute how to do it) you will get an unespected result:
RS-DBI driver warning: (unrecognized PostgreSQL field type geometry (id:16432) in column 0)
This puzzling message stands for: R does not understand the internal binary encoding of geometries of PostGIS and takes the result like a very long string of characters ("0103000020110F0000010000002100000..."). To bypass this problem, the resulting polygon (the buffer) must be transfored in a WKT wrapping the ST_AsText function around the ST_Buffer function. The query becomes:
query = paste0("SELECT ST_AsText(ST_Buffer(ST_GeomFromText('", wktPoint ,"', 3857),1000)) AS polygon")
To connect R to PostGIS we use the RPostgreSQl package [6] and customize the connection for your PostGIS enabled database:
require(RPostgreSQL)
connection = dbConnect(PostgreSQL(), dbname=myDB, host=myHost, user=myUser, password=my****)
buffer = dbGetQuery(connection, query)
dbDisconnect(connection)
Now we have a polygon correctly encoded in WKT. We can use the rgeos' function readWKT to tranform it directly in a SpatialPolygons structure:
poly=readWKT(buffer)
Since WKT deos not include a spatial reference along with the geometry, we can add the Coordinate System using
proj4string(poly) = CRS("+init=epsg:3857")
...And that's all, for now.
As said before, an operation like this could not be the best example of an optimized GIS analysis, but PostGIS + R could be another pretty tool in the pocket to work on spatial data.