Analyses of place and health have been largely cross-sectional, and new challenges are faced as we wrangle longitudinal geographic data. Our group just published a manuscript detailing our work to clean and code data on all NYC metropolitan area businesses over the period 1990-2010. Our goal was to use twenty years of business establishment data to characterize changes in neighborhoods in terms of the retail food environment, access to physical activity venues, access to medical facilities and access to other commercial and not-for-profit establishments.
Our process included re-geocoding 3,161,715 business locations to avoid disproportionately missing data on older businesses; identifying and coding health-relevant businesses such as food sources and fitness venues across the years; and collapsing potential duplicate business records by location, year, and business category. Spot-checking was used, and the data are set up to allow for sensitivity analyses to check the robustness to these decisions as we move forward.
This effort was championed by lead author Tanya Kaufman, who has engaged in this effort since her MPH practicum project using these data. Daniel Sheehan was the lead geographer on the project and developed the re-geocoding strategies and created time-lapse visualizations of businesses entering and existing the environment. One of Dan’s visualizations can be seen here. It shows the location of Healthy Food Outlets from 1990 to 2010.
The focus of this project was not only to understand and improve the quality of data for future analysis, but also to develop scalable approaches that can be used with the larger national dataset. We have recently been funded to purchase the nationwide business establishment data and to link these data to ongoing cohort studies of cardiovascular disease (R01AG049970-01A1, PI: Lovasi).