Happy GeoHacking

I just got done teaching a short workshop on GIS in Python. Something may be odd to those of you who use FOSS packages to do GIS: I didn’t use any GeoPandas.

This wasn’t a constraint I wanted. But, just getting pure pip-installable packages like Shapely and PySAL was difficult enough. The admins didn’t want to install one of the big scientific Python distributions and would only let something with minimal impact get used.

So, in composing the workshop, I focused on Pandas, PySAL, and Shapely. In it, I had the students build an analogue to the GeoPandas dataframe by constructing dataframes with series containing a Shapely object. This was remarkably easy, but helped the attendees understand how to do spatial operations on Shapely objects.

Overall, though, I was struck with how easy it was to replicate the core class features of GeoPandas, notably without the use of a heavy IO library like Fiona. Every few months, the PySAL team tends to discuss whether or not to migrate to using Fiona on the backend, instead of our currently existing, pure Python solutions. I’ve been working on fixing up these solutions to get ready for a release of PySAL in Python 3, so moving to something like Fiona looked like it’d save a ton of work. But, the costs in speed and install difficulty for adding a dependency on OGR has never been a popular option.

So, just now, I pushed up a pseudo-fork of GeoPandas that incorporates the main functionalities of GeoPandas while trying to strip any reference or call to Fiona or OGR. The idea here would be that this fits in any space PySAL fits in, and works as an alternative frontend to PySAL’s current heavy leverage of raw numpy matrices and vectors.

But, having just recently re-read the ESR column How to be a Hacker, I wondered how productive this conversion/strip job would actually make people. In the essay, Eric Raymond recommends that no problem should be solved twice.

And, in this case, I think I’ve certainly re-solved the problem of reading in spatial data into a GeoDataframe. But, notably, I’ve done it using a backend with much less generality than the other solutions!

So, why did I decide to do this? I can certainly handle an OGR install. Anyone who wants to use PySAL on a GeoPandas-like dataframe should themselves be able to string together simple calls to numpy.array to cast their dataframes down to vectors that PySAL can use. In fact, I’m still working on solving a core issue of GeoPandas/PySAL interoperability: construction of a spatial weights object inline with the GeoDataframe.

I guess I’m happy with the fact that this was bourne out of a realization had during that workshop. And, this was a great exercise in (yet more) io hacking with PySAL. Where some might not be able to handle an OGR install and thus not be able to grab Fiona or GeoPandas, this geodf contrib module could work. But, I think, in the future, I’ll be focused more on linking GeoDataframe instances to PySAL analytics methods (using Patsy & a fast shapely->PySAL weights constructor) rather than simply replicating an io framework that Fiona makes more general and more easy to use.

In full, swapping PySAL.core.IOHandlers in as the main engine driving IO has convinced me that an internal GeoJSON representation fully abstracted from whether the underlying data is geojson, shapefile, coverage, PostGIS, or whatever, as is provided by Fiona, is still probably the right way to do geospatial FileIO.

imported from: yetanothergeographer