Python script to process and merge NYC MapPLUTO data


Daniel Sheehan

We are happy to share a python script that downloads and compiles all of the current and archived New York City (NYC) Department of City Planning’s (DCP) MapPluto versions into a single file geodatabase with feature datasets for each year-version. BEH GIS team developed the script to save time and effort in  downloading, unzipping and merging all this data by hand.  We hope this script will save time for anyone else who wishes to compile all this data. The script is based on an in-house urllib script for mining tract shapefiles by state from the US Census Bureau and was developed by Daniel Sheehan.

Click the link to get the Script link
Click the link to get the Readme link

More about the public release of MapPLUTO data…

In July of 2013 NYC DCP announced that, after nearly 10 years of licensing the data by borough for a fee, the most recent version of MapPluto (combination of DCP’s map geometry of tax block and lots and Primary Land Use Tax Lot Output from the NYC Department of Finance) would be free to the public.  This was a huge leap forward and an important milestone in NYC’s progress towards open data. While many data sets were free for public consumption in the past, MapPluto was often the most coveted. In the days and weeks that followed the release of the most recent MapPluto data set many questions regarding the opening of previous versions of MapPluto emerged. MapPluto data license holders wondered if earlier versions could be shared freely as well.  On Friday December 6th, it became apparent to the NYC open data community that MapPluto data from 2003 to present day was now freely available to use and share.

MapPluto data are available by year and version as individual .zip files. Within these .zip files are folders that contain shapefiles for each borough. From 2003 to 2013 there are 16 versions of the data, consisting of 80 shapefiles. This data is fairly easy to download and start using for one borough for one year. However, it takes a lot of time and effort to download and merge all this data manually. One could easily spend the better part of a day or two downloading, unzipping, merging these data sets into a file format that could hold them. The file size limitations of shapefiles mean that a single shapefile cannot contain the whole data set merged across boroughs. Hence, Dan decided, rather than immediately grabbing all the data in the normal point and click fashion and merging the datasets using the traditional ArcGIS UI, to create a distributable script to share on GitHub.  A user need only define a working directly and the downloading, unzipping and merging of the data into a single file geodatabase, with feature datasets for each version, occurs automatically.  This script can be accessed here on GitHub. The read me file with more detailed instructions can be accessed here.  So far users have reported script download and completion times anywhere from under 1 hour to 12 hours, depending on internet connection and processing bandwith.

Since this script is Esri-based, users of QGIS and other open-source GIS software will not be able to execute it, the script utilizes the arcpy python module. The next step is to generate an accompanying script that allows the user to download and compile everything into an open source GIS format such as geoJSON.

This entry was posted in Tools. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s