Saturday, March 22, 2014

When there's no more room in hell... contagious models and zombies in R (First try)

On February I decided to attend the Model Thinking Coursea's MOOC. After learning the basics of 'contagion models', I decided to dig a little more into the topic of SIR models  (Susceptible-Infected-Removed).
Being a total noob of this field I surfed the internet: I was nicely surprised to find the paper of Munz et al.,2009 where the SIR approach has been used to describe the zombie apocalypse.
I am a big fan of zombie movies  (The Walking Dead is one of my preferred TV serie) so I decided to  translate the basic model by Munz et al. in R, with some further simplifications introduced by the authors.
OK, let's start with a little bit of theory about an outbreak of a zombie infection:
  • We know from the zombie films that we are all infected and so we are all in the Susceptible class (S).  
  • When one in the S class deceases, for natural causes or by a zombie attack, moves to the Removed class (R). 
  • People in the R class can resurrect and become zombie (parameter ζ): from this moment a zombie (now in the Z class) start to wander craving for for human flesh and trying to bite the susceptibles. 
  • As explained the paper, people in class S can avoid zombification through an altercation with a zombie by defeating it, cutting its head or destroying its brain and so resisting the infection (parameter α). 
  • In the unfortunate case that a Susceptible succumbs to an attack by an undead, the infection transmits from the zombie to the Susceptible, the latter will die and  briefly resurrect as member of the zombie class (transmission parameter β).
The flowchart of the SZR model s described in the image below.

Flowchart of the SZR model

Thefollowing system of nonlinear differential equation is the formalization of the SZR model:


To solve the model in R I'll use the deSolve package by Soetaert & Petzoldt, thanks to this great piece of software, the formalization is pretty just straightforward, like writing the equations with a pencil on a sheet of paper:

require(deSolve)

SZR = function(t, state, parameters) {
  with(as.list(c(state,parameters)), {
    dS = -beta*S*Z
    dZ =  beta*Z*S + zeta*R - alpha*S*Z
    dR =  alpha*S*Z - zeta*R
    
    return(list(c(dS, dZ, dR)))
  })
}

times = seq(from= 0, to= 10, by= 0.01)

state = c(S= 500, Z= 0.0, R= 1.0)

parameters = c(alpha= 0.03, beta= 0.0445, zeta=0.2)

output = lsoda(y=state, times=times, func=SZR, parms= parameters)

As written in the code above, the initial state is represented by 500 Susceptibles and just one Removed (case 0). The parameters used in the model are α=0.03, β=0.0455, ζ=0.2, as proposed by Witkowski and Blais, 2013 analyzing the evolution of the infection from Shaun of the Dead and Night of the Living Dead (hey, data collecting can be fun!)

The results of the modelled outbreak is shown below:
The outbreak of the zombie infection, the solution of the SZR basic (and simplified) model
No hope here: this model predicts the doomsday with zombies quickly infecting everyone in a very short time  resulting in the collapse of the whole society.

Witowsky & Blais are very critic about the basic model proposed by Munz et al. and indeed three others models are proposed in the paper. I just agree with the critics: the ζR assumption is not reasonable (as far we can use the word 'reasonable' talking about a zombie outbreak): with this parametrization, when zombies are killed they can be recycled from the R group to the Z group again. Instead, when a zombie is defeated by destroying its brain it cannot resurrect anymore.

But for this first try, i think it is just enough. Lessons learned so far:
  1. the SIR models are fun.
  2. the SIR models with zombies are even funnier (at least for me, a zombie geek).
  3. the deSolve package for R is really powerful but surprisingly easy to use.
In the next post I will try to implement a different and more realistic model for the outbreak of the zombie infection and see the differences with the one just proposed here. Stay tuned, and just in case you meet a zombie... just run and 'don't look back'.

Monday, March 10, 2014

Spatial Data Structures: from R to PostGIS and back to R

Last friday I had a chat on twitter with ElCep and Joel G about the possibility to send R's spatial data structures to a PostGIS enabled database and retrieve them back to R.
Just right now, R can be used as a powerful GIS tool and the developement of spatial-oriented packages proceeds at a steady state .
On the other hand PostGIS is an unvaluable resource for storing and managing godzillions of vector and raster bits.

Coupling these two tools is like having a swiss knife for doing both statistics and GIS analyses when your datastests become bigger and bigger [1].

OK... after this cheesy introduction, let's start from the beginning and suppose we have just a point location called 'home' in x= 1247289, y=5805658. We would like to generate a buffer of 1 km around the point using PostGIS and then get the resulting polygon back in R

Although the buffering operation can be done both completely in R and in PostGIS, this approach shows the trick to interoperate between the two software using the Well Known Text encoding of geometries, which is an OGC standard [2]. WKT is the "lengua franca" between R and PostGIS.

So, let's get our hand dirty on it: first we push the point into R.
point = data.frame(x= 1247289, y= 5805658, name= 'home')
With the R package sp [3] we transform it in a spatialPointsDataFrame
require(sp)
coordinates(point) = c("lon","lat")
proj4string(point) = CRS("+init=epsg:3857")
With these operation we just trasform a dataframe in something different, a spatial structure, with a defined position on the earth surface using a Spherical Mercator Coordinates Reference System (CRS).
Now, to use this point inside PostGIS, we have to push it towards the database by the WKT encoding: to do so we'll use the functions in the rgeos package [4].
If you look at the documentation rgeos is a powerful GIS tool allowing calculations of buffers, intersections... Nothing surprising  here: the topological library GEOS [5], on which rgeos is built on, is one of the pillars for the PostGIS engine and for almost every other FOSS GIS.
require(rgeos)
wktPoint = writeWKT(point)
So now we have our point geometry ready for PostGIS. The trick is to use the ST_GeomFromText PostGIS' function that takes a WKT
as input and transform it in the internal encoding of the RDBMS. By the way, the encoding of our point in the DBMS is "0101000020110F000000000000390833410000008096255641", and so we just build the query as:
query = paste0("SELECT ST_Buffer(ST_GeomFromText('", wktPoint ,"', 3857),1000)")

Nothing fancy in this spatial query: it just takes our point and builds a 1 km buffer around it. Pay attention to the double and single quotes: the WKT is a string of text and so it must be single quoted in the SQL.

If you try to fire the query from inside R (we will se how in a minute how to do it) you will get an unespected result:
RS-DBI driver warning: (unrecognized PostgreSQL field type geometry (id:16432) in column 0)
This puzzling message stands for: R does not understand the internal binary encoding of geometries of PostGIS and takes the result like a very long string of characters ("0103000020110F0000010000002100000..."). To bypass this problem, the resulting polygon (the buffer) must be transfored in a WKT wrapping the ST_AsText function around the ST_Buffer function. The query becomes:
query = paste0("SELECT ST_AsText(ST_Buffer(ST_GeomFromText('", wktPoint ,"', 3857),1000)) AS polygon")
To connect R to PostGIS we use the RPostgreSQl package [6] and customize the connection for your PostGIS enabled database:
require(RPostgreSQL)
connection = dbConnect(PostgreSQL(), dbname=myDB, host=myHost, user=myUser, password=my****)
buffer = dbGetQuery(connection, query)
dbDisconnect(connection)
Now we have a polygon correctly encoded in WKT. We can use the rgeos' function readWKT to tranform it directly in a SpatialPolygons structure:
poly=readWKT(buffer)
Since WKT deos not include a spatial reference along with the geometry, we can add the Coordinate System using
proj4string(poly) = CRS("+init=epsg:3857")
...And that's all, for now.
As said before, an operation like this could not be the best example of an optimized GIS analysis, but PostGIS + R could be another pretty tool in the pocket to work on spatial data.