Commandments for Variable Naming and Data Management

Mel Brooks and the 15 Commandments from History of the World Part 1

Mel Brooks and the 15 Commandments from History of the World Part 1

As we launched another multifaceted geographic data linkage study our multi-institution team, that includes researchers at Drexel University, Columbia University and the University of Washington, has developed a set of commandments to streamline and harmonize our data management, variable naming and data coding processes.


  1. Thou shalt not transmit HIPAA/IRB protected data, nor data protected by licensing agreement without PI approval.

Clearly, we both want to be responsible custodians of the data entrusted to us, and avoid getting into trouble.  For additional discussion of cautions around the common practice of using online tools to characterize addresses, see our recent commentary.

  1. Thou shalt always use YYYYMMDD when formatting date variable values, stored as a string.

The date storage was much discussed by our group, but ultimately we wanted a solution that would sort chronologically, be readable to humans, and be usable seamlessly across software that use a different sentinel date.

  1. Thou shalt always use YYYY when using a year in a variable name.

Given that our studies of adult health frequently span both the 1990s and 2000s, using 4 digits (versus 2 digit) for year when possible allows for easier conversion from wide to long format, and sorting in chronological order.

  1. Thou shalt prefer use of tall rather than wide data formats to avoid storing empty data and simplify query expressions.

As we move to using longitudinal data on where people live, and how their environment has changed over time, the structure of data becomes more complex.  Long format avoids storing fields for which many observations have no data.  However, the overarching goal is efficiency and usability, which may at times favor a wide format instead.

  1. Thou shalt always use lowercase for variable names to avoid case sensitivity issues when jumping between software.

Inconsistent capitalization in variable names is a source of frustration for users of software such as STATA.  A typical scenario is that you have working syntax, receive an updated dataset with differences in capitalization (which a user of less case sensitive software packages such as SAS may not be attentive to), and have to spend time troubleshooting and editing to get it to work again.  While conventions vary, we decided the simplest thing would be to use only lowercase in our variable names. Continue reading

Posted in Methods, Tools | Leave a comment

JAMA on Walking and Walkability

High and low walkability neighborhoods in NYC

High and low walkability neighborhoods in NYC

Following up on its two recent articles about neighborhood walkability, including an editorial co-authored by Andrew Rundle, JAMA today published a Medical News and Perspectives article entitled “As Walking Movement Grows, Neighborhood Walkability Gains Attention”.  The article notes the various Federal Agencies that are working on improving neighborhood walkability including: the US Department of Health and Human Services which launched an initiative “Step It Up! The Surgeon General’s Call to Action to Promote Walking and Walkable Communities.”; the CDC funded National Physical Activity Plan Alliance’s forthcoming (expected early 2017) “Walking and Walkability Report Card”; and the collaboration of the USDOT, the CDC, and the American Public Health Association to release the online Transportation and Health Tool, which provides access to data on the health effects of transportation systems and includes a focus on active transport.

In regards to the lack of randomized trial data on neighborhood walkability and the paucity of longitudinal studies in the literature, the article quotes Jim Sallis Sallis saying that even without direct evidence of causality, “the correlational evidence is really piling up.” and that “the risk of improving walkability appears very low, whereas the benefits could be very substantial.”

Posted in Physical Activity, Walkability | Leave a comment

Steve Mooney receives Poster Award at Epidemiology Congress of the Americas 2016

HomeSteve Mooney, a recently minted PhD who did his doctoral work with the BEH group, won a best poster presentation award at the 2016 Epidemiology Congress of the Americas for his work on the Neighborhood Environment-Wide Association Study design. Dr. Mooney’s poster, available as a PDF here, explored the potential to apply theory-agnostic empirical approaches to finding the neighborhood measures most predictive of physical activity among older adults in the NYCNAMES-II cohort.

Posted in Methods, Physical Activity, Social Determinants, Urban Design | Leave a comment

Urban Design to Support Walking and Health

JAMA just published an editorial co-written by, Andrew Rundle, entitled “Can Walkable Urban Design Play a Role in Reducing the Incidence of Obesity-Related Conditions?”.  The editorial provides a perspective on a study published in JAMA by Creatore et al., that assessed the prevalence of obesity and incidence of diabetes from 2001 to 2012, by level neighborhood walkability across 15 municipalities in Canada.

Urban design to support active transport (walking and cycling) is an attractive avenue for public health interventions to increase population levels of physical activity and reduce the burden of obesity and diabetes. In the urban design/planning literature walkability is typically described in terms of the “D variables“: density of population, density of residences, density of public transit stops, design of street networks, and destination accessibility.  In many cities there is not enough undeveloped space to create new large urban parks that support exercise and recreational physical activity. Local governments have some policy mechanisms for influencing neighborhood retail food access: tax and loan incentives can be used to promote the development of new supermarkets, but efforts to ban new fast food outlets are controversial and have not been found to be effective. However, improvements in neighborhood walkability can be promoted through permitting, zoning, land use regulations, and street design, activities all under local governmental control. In addition, although public transit receives state and federal funding, key decisions about transit capital investment and operations are made at the local level.


Furthermore our work in NYC finds that variation in residential neighborhood walkability is more strongly associated with physical activity than is variation in residential neighborhood access to parks.  Similarly we find that variation in neighborhood walkability is more strongly associated with body mass index than is variation in neighborhood access to healthy food outlets, fast food outlets and park spaces.

Posted in Uncategorized | Leave a comment

Can Big Data get us Better Estimates of Neighborhood Disorder?

Physical Disorder in Philadelphia estimated using Universal Kriging

Physical Disorder in Philadelphia estimated using Universal Kriging

At the Built Environment and Health group, we try hard to measure neighborhood characteristics accurately. We systematically audit Street View imagery, we use LiDAR scans to assess tree canopy, and we use business registration records to profile neighborhood retail. A lot of these measures are spatially interpolated. For example, it’s not feasible to collect pollen counts everywhere in the city, but we can take pollen count samples at a few locations and use those samples to estimate the pollen counts we would have measured in places we couldn’t measure.

One technique we use for spatial interpolation is ordinary kriging. Ordinary kriging uses the spatial correlation between sampled points – that is, on average, how similar are pairs of points at any given distance – to estimate, with confidence levels, measures at unobserved locations. Ordinary kriging was initially developed in geology – miners sampled minerals in locations that appeared promising, then analyzed the spatial variation in mineral content between the samples in order to identify potential gold deposits. We and others have borrowed this technique for neighborhood measures, like when we used ordinary kriging to estimate physical disorder levels throughout cities.

But a key assumption underlying an ordinary kriging model is that there’s a continuous correlation for the measures of interest – on average, mineral content at a given location looks more like the mineral content 50 feet away than the mineral content 100 feet away. We started wondering whether the assumption of continuity doesn’t hold for neighborhood disorder.  For example, if physical conditions are worse on the ‘wrong side of the tracks’, then a measure of nearby conditions that happens to be on the opposite side of the tracks might tell us less than a measure that’s further away but on the same side of the tracks. Maybe, we thought, if we pull external information like the side of the tracks into our interpolation models, we can interpolate more accurately. Continue reading

Posted in Methods, Physical Disorder, Street View | Leave a comment

Maintaining Human Subject’s Protections in Neighborhood Health Effects Research


Geocoding a study subject’s address with Google Maps or Earth transmits that personal identifier to Goolgle.

We recently published a commentary in the American Journal of Public Health describing the concerns we have for protecting study subject anonymity with the use of online geographic and data tools in neighborhood health effects research.  Examples of neighborhood data available from these tools include crime statistics from the New York Times and EveryBlock, neighborhood walkability scores from, restaurant locations from Yelp and geocoding services from Google Maps. These online resources create new opportunities for medical geographic research but also create new ways in which study subject confidentiality can be broken.  Typically these web-tools allow a user to enter an address into an online interface and receive back data about the geographic area around that location.  We have seen study protocols, training materials, and published papers involving the submission of study subject’s home and/or work addresses to such web services.  The broad terms of service on most websites usually permit these service providers to freely use any data passed to them rather than hew to strict rules established by institutional review board (IRB) protocols to protect human subjects.  Furthermore, online advertising tracking cookies on the researcher’s (or research assistant’s) browser could be used to release respondent addresses to additional parties without the researcher’s knowledge.  In the Commentary we describe approaches to using these online services for neighborhood health effects research while maintaining human subjects protections.

Posted in Methods, Tools, Uncategorized | Leave a comment

Measuring Pedestrian Activity Using GPS Logger Data

It has been suggested that GPS monitoring data can be used to estimate distances traveled and speeds of travel during active and non-active travel journeys and, that when combined with accelerometer monitoring, GPS data can be used to identify travel mode.  We tested whether the distances between successively captured GPS way points can be used to measure distances walked in varying environments in NYC. Students walked a series of structured routes in areas with high and low building bulk density and on streets with high and low tree canopy cover while wearing GPS monitors.  The sums of distances between successive GPS way points over estimated travel distances and over estimates were larger in areas with high building bulk density and on streets with high tree canopy cover. Algorithms using distances between successive GPS points to infer speed or travel mode may misclassify trips differentially across built environment contexts.  The abstract can be found HERE and the full paper will be available in the American Journal of Public Health.

Below is an image of the GPS data collected during walks along streets in low and high building bulk density.

GPS data collected during walks along streets in areas with low and high building bulk density. Image by Dan Sheehan.

GPS data collected during walks along streets in areas with low (left side) and high (right side) building bulk density. Image by Dan Sheehan.

Posted in Active Transport, GPS, Physical Activity | Leave a comment

Maps of Neighborhood Physical Disorder

The Journal of Maps recently published our article showing a high resolution map of neighborhood physical disorder in New York City.

Physical disorder – the deterioration of urban spaces owing to social forces favoring neglect and abandonment – has long been of interest to social scientists [1, 2].  Criminologists and sociologists have debated the controversial ‘broken windows’ theory that disorder encourages violent crime [3, 4]. Separately, psychologists and psychiatric epidemiologists have investigated whether living amidst disorder negatively affects mental health, not only directly as stress induced by encountering a chaotic environment triggers earlier cognitive decline [5] but also indirectly as residents adopt coping mechanisms such as alcohol use that themselves trigger longer-term harms [6].

The data underlying the map was collected using neighborhood audits implemented via Street View. In addition to data collection in NYC, the team collected neighborhood physical disorder data from San Jose, California; Detroit, Michigan; and Philadelphia, Pennsylvania.  Below is a heat map of Philadelphia showing the distribution of neighborhood physical disorder across the city.

Neighborhood physical disorder in Philadelphia. Draker area within Philadelphia have higher levels of physical disorder.

Neighborhood physical disorder in Philadelphia. Draker area within Philadelphia have higher levels of physical disorder.

Posted in Uncategorized | Leave a comment

Our Pedestrian Injury Research gets Further Coverage.

The Mailman School blog reached out to Steve Mooney to discuss our research on pedestrian injuries.  The post shows a series of Street Views of the key features that were associated with injuries.  The article is Here. And an article in New Scientist. And on

Posted in Uncategorized | Leave a comment

Using Google Street View to Understand Pedestrian Injury Risk

Street Viewing 125th Street

Street Viewing 125th Street

In 2013, an estimated 70 000 pedestrians were injured or killed by motor vehicles in the United States. In New York City more pedestrians than vehicle occupants have been killed by motor vehicles each year since at least 1910.  Pedestrian safety is not only vital for public health directly through reduced traffic-related morbidity and mortality, but also indirectly as the perception of increased safety from traffic encourages outdoor physical activity, with consequent mental and physical health benefits.

We just published an article in the American Journal of Public Health in which we use Google Street View to identify characteristics of streets and intersections associated with pedestrian injuries and fatalities.  Following up on our work using Street View to conduct virtual street audits (1, 2, 3), we used the CANVAS system to collect data on built environment characteristics at street intersections with varying numbers of pedestrian injuries.  Higher counts of pedestrian injuries at intersections were associated with the presence of nearby billboards and bus-stops.  Injury incidence per pedestrian was lower at intersections with higher estimated pedestrian volumes.

The use of virtual street audits allowed us to complete the research in a much shorter time period than comparable studies that use in-person audits to collect data at intersections. We are planning to expand this research to conduct a nationwide study of built environment risk factors for pedestrian injury.

Jerome Ave and Fordham Road in the Bronx, the intersection with the highest number of injuries in our study.

Jerome Ave and Fordham Road in the Bronx, the intersection with the highest number of injuries in our study.

Posted in CANVAS, Pedestrian Injury, Safety, Street View | 2 Comments