A lot of us in the neighborhood health effects field create or use neighborhood contextual measures that are aggregations of population data from the Census or other large social surveys. For instance, common measures of neighborhood level socio-economic status, such as poverty rate, per capita income, and median household income, are all derived from Census data on individual respondents’ self-reported household income. Our recent work has asked the question: what is the effect on our estimates of association between a neighborhood level socioeconomic variable and an individual level health outcome when the individual self-reports on income to the Census include random measurement error?
We find that the effects of this type of measurement error depend a great deal on the choices an investigator makes in picking their neighborhood measure of interest and the strategy used in the data analysis. In the face of measurement error in the underlying Census data the use of contextual variables expressed as percentages (e.g. percent living in poverty) result in overestimates of association magnitude. In the face of the same measurement error, the use of contextual variables expressed as full continuums (e.g. median household income) results in equal probability of an overestimate or underestimate of association magnitude in any specific study. If a contextual measure expressed as a percentage is further manipulated to be expressed as quantiled categories of that measure (e.g. top 20% of neighborhoods vs bottom 20% by poverty rate), bias is removed. However, if the same variable is expressed as categories based on an external threshold (e.g. a high poverty neighborhood is defined as a neighborhood with a percent living in poverty of greater than 20 percent) bias remains.
The full presentation of our research into the effects of measurement error in the construction of contextual variables can be found in our recent article in Epidemiology.
The diagram below illustrates the effects of this type of measurement error and the appropriate choices an investigator can make to mitigate these effects. In the diagram the top graph depicts individual-level Census data for income for 2 hypothetical zip codes and 2 common aggregations of these data to create zip-code-level contextual measures; % living in poverty and median household income. The black arrows extending from the data points on the 2 lower graphs depict the effects of non-differential measurement error in the Census income data on estimates of the zip-code-level contextual measures, and the black arrows below the graphs indicate the bias resulting from analytic decisions under these conditions.