1. Introduction
Physical inactivity is a major modifiable risk factor for morbidity, disability and premature mortality worldwide [
1]. Physical activity promotion serves as a key public health strategy [
2]. However, four in five American adults fail to meet guidelines-recommended physical activity levels [
3]. While traditional perspective mainly blamed individuals themselves for their sedentary lifestyle, increasing attention has been shifted to the environmental determinants of health behavior—how political, economic, ecological, and social contexts in which people are born, live, work and age impact their physical activity [
4]? For instance, availability and proximity of parks, bike lanes and recreational facilities has been consistently linked to increased physical activity and active commuting [
5]; whereas neighborhood crime rate is inversely associated with outdoor activities and sports among local residents [
6].
One major challenge in understanding the complex relationship between environment and physical activity is that both factors vary substantially across the U.S. While spatial variations in the prevalence of physical inactivity have been consistently documented [
7], little is known regarding the potential spatial heterogeneities of the environmental determinants of physical inactivity—whether and to what extent spatial variations in physical inactivity are associated with spatial variations in environmental attributes. Examining geographical variations in the relationship between environment and physical activity could help customize and prioritize population-level policy interventions and improve their effectiveness in reducing sedentary behavior and promoting a healthier lifestyle. For example, if it is found that high crime rate, air pollution, and lack of parks and bike lanes are the respective key driver of physical inactivity in three different geographical areas, instead of launching a universal intervention across all three locations, customized interventions of law enforcement, emission regulation, and infrastructure enhancement should be carefully evaluated and implemented in each area.
Conventional models are mostly global, which assume and estimate a single effect that remains constant across geographical locations. In this study, we assessed the geographical variations in the environmental determinants of physical inactivity among U.S. adults using geographically weighted regression (GWR). GWR relaxes the assumption of statistical independence between observations and spatial stationarity, and can provide local estimate specific to each geographical area of interest [
8]. The unique features of GWR make it well suited to examine the differential environmental impacts on physical activity across the nation.
2. Methods
2.1. Data
Data on county-level prevalence of leisure-time physical inactivity came from the Centers for Disease Control and Prevention (CDC)’s County Data Indicators (CDIs) [
9]. The CDIs are estimated based on the annual survey data collected from the Behavioral Risk Factor Surveillance System (BRFSS). BRFSS is the nation’s primary system of health-related repeated cross-sectional telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors, chronic conditions, and use of preventive services. BRFSS questionnaires, sampling design and survey datasets can be found elsewhere [
10]. Since 2001, the question regarding leisure-time physical inactivity in BRFSS is, “During the past month, other than your regular job, did you participate in any physical activities or exercises such as running, calisthenics, golf, gardening, or walking for exercise?”. Leisure-time physical inactivity was identified from answers of “no” to this question.
Environmental quality is the characteristics and properties of the environment that impact human beings and other organisms [
11]. Developed by the Environmental Protection Agency (EPA), Environmental Quality Index (EQI) is a comprehensive measure of county-level environmental conditions that affect human health [
11,
12]. Five subdomains contribute to environmental quality: air, water, land, built, and social environments. The air subdomain is constructed based on two data sources that document monitoring, emissions, and modeled estimates of criteria and hazardous air pollutants. The water subdomain is constructed based on nine data sources that document drinking water quality, public water supply, draught, and pollutants in rainfall and recreational water. The land subdomain is constructed based on 12 data sources that document agriculture, industrial facilities, geology and mining, and land cover. The built subdomain is constructed based on four data sources that document traffic and transit, pedestrian safety, access to physical activity, food environment, and school environment. The social subdomain is constructed based on three data sources that document county-level sociodemographics (e.g., percent of non-English speakers, and median household income), mean number of violent crimes per capita, and housing conditions (e.g., percent of renter occupied, percent of vacant units, and median number of rooms per house). Each domain forms its own separate index, and the overall EQI is the sum of scores from all five subdomains. Both the overall EQI and its five subdomains are standardized to have a mean of zero and a standard deviation of one. Therefore, a positive EQI denotes a higher-than-average environmental quality, and a negative EQI denotes a lower-than-average environmental quality. No specific cutoffs for the standardized EQI scores are provided by the EPA. In this study, we adopted a continuous measure of EQI scores, and no specific cutoffs were used. Detailed information regarding the EQI construction and relevant data sources can be found elsewhere [
11,
12].
The EQI measures were constructed based on the data sources collected during 2000–2005. The EPA is currently working to update the EQI based on more recent data but the new indices have yet been available [
11]. Therefore, this study used the 2005 data for physical inactivity prevalence and all other county-level characteristics.
The following county-level characteristics that were likely to correlate with residents’ physical activity level were adjusted in regression analyses. Annual average daily maximum temperature (°F), daily sunlight (KJ/m
2) and daily precipitation (mm) came from the CDC WONDER systems [
13]. Population density (1000 persons per square mile), percentage of racial/ethnic minorities, percentage of high school and lower education, and percentage of households below the federal poverty level came from the U.S. Census Bureau [
14]. Data on annual unemployment rate came from the U.S. Bureau of Labor Statistics [
15].
2.2. Statistical Analysis
We performed GWR to examine the geographical variations in the environmental determinants of physical inactivity in the contiguous U.S. Conventional regressions such as the Ordinary Least Squares (OLS) are a global model, which assumes that study subjects residing in different neighborhoods are independent of each other [
16]. However, spatial features and their associated data values are often clustered together in space (i.e., positive spatial autocorrelation) or dispersed (i.e., negative spatial autocorrelation), which violates the assumption of statistical independence in the OLS. Moreover, OLS estimates a single set of associations between the dependent variable and the independent variables, which implies spatial stationarity of the relationship. However, the characteristics of a particular area may impact the direction and magnitude of the relationship, which can deviate from the global estimate. GWR relaxes the assumptions of statistical independence and spatial stationarity and produces a range of area-specific coefficients. GWR instantaneously performs many regressions so that there is one regression per spatial data point (e.g., county). Observations closer to a particular data point will have a higher weight than those farther away. These distinct features of GWR makes it particularly suitable to assess the spatial variations in the relationship between local environment and outcomes of interest.
We followed the three steps below in statistical analysis. First, we conducted two OLS regressions. The dependent variable in both models was county-level percentage of leisure-time physical inactivity. The key independent variable in the first OLS regression was the overall EPI, which was standardized to have a mean of zero and standard deviation of one (i.e., EPI z-score). The key independent variables in the second OLS regression were the five standardized EPI subdomains—air, water, land, built, and social environment. Both OLS regressions controlled for county-level characteristics including daytime temperature, sunlight, precipitation, population density, percentage of racial/ethnic minorities, percentage of high school and lower education, percentage of households below the federal poverty level, and unemployment rate. Second, we calculated the Moran’s I of the residuals estimated from the OLS regressions across U.S. counties. As a measure of spatial autocorrelation, a large and statistically significant Moran’s I would suggest the presence of substantial geographical variations in physical inactivity rates that were not explained by the OLS regressions, which justified the use of GWR. Third, we performed two GWRs, with the overall EPI and the five EPI subdomains as their respective key independent variable(s). Bothe GWRs controlled for county-level characteristics. We then calculated the Moran’s I of the residuals estimated from the GWRs across U.S. counties. If the GWRs could better model geographical variations of physical inactivity rates than the OLS regressions, the Moran’s I of the residuals estimated from the GWRs should be substantially reduced in comparison to that from the OLS regressions. We also compared the R-squared between the GWRs and the OLS regressions. If the GWRs fitted the data substantially better than the OLS regressions did, the R-squared from the GWRs should be noticeably larger than that from the OLS regressions.
OLS regressions were performed using Stata 14.2 SE version (StataCorp, College Station, TX, USA). GWRs were performed using GWR4.09. U.S. county maps were constructed using ArcGIS10.5 (ESRI, Redlands, CA, USA).
2.3. Ethical Approval
This study used publicly available county-level aggregate data and involved no human subjects. Therefore, the study was exempted from human subject review by the corresponding author’s university institutional review board.
3. Results
Table 1 summarizes county-level characteristics. The prevalence of leisure-time physical inactivity among U.S. counties averaged 25% in 2005. Annual average daily maximum temperature was 65.3 °F, average daily sunlight 16.3 KJ/m
2, and average daily precipitation 2.6 mm. County-average population density was 250 persons per square mile. Racial/ethnic minorities accounted for 13.1% of county population, education of high school and lower 17.3%, and household below poverty line 15.3%. Annual unemployment rates averaged 5.4% across U.S. counties in 2005.
Table 2 and
Table 3 report the estimated impacts of environmental quality on physical activity based on the OLS regressions. One standard deviation decrease in the overall EQI was found to be associated with an increase in county-level prevalence of leisure-time physical inactivity by nearly 1% (95% confidence interval (CI) = 0.81%, 1.15%) (
Table 2). One standard deviation decrease in the EQI water, social, and built environment subdomains were associated with an increase in county-level prevalence of leisure-time physical inactivity by 0.41% (95% CI = 0.28%, 0.54%), 0.37% (95% CI = 0.03%, 0.70%), and 0.65% (95% CI = 0.51%, 0.79%), respectively (
Table 3). EQI air and land subdomains were not found to be associated with county-level physical inactivity (
p-values ≥ 0.05). Annual average daily maximum temperature and precipitation, percent of racial/ethnic minorities, education of high school and lower, and households below the federal poverty level, and annual unemployment rate were positively associated with county-level physical inactivity; whereas annual average daily sunlight was negatively associated with physical inactivity.
Table 4 and
Table 5 report the estimated impacts of environmental quality on physical activity based on the GWRs. Substantial geographical variations in the estimated environmental determinants of physical inactivity were present. The estimated changes of county-level prevalence of leisure-time physical inactivity resulted from one standard deviation decrease of the overall EQI ranged from an increase of over 3% to a decrease of nearly 2% across U.S. counties (
Table 4). Statistically significant inverse associations between the overall EQI and physical inactivity rate occupied over 20% of all county-specific coefficient estimates, whereas statistically significant positive associations accounted for merely 1%. The estimated changes of county-level prevalence of leisure-time physical inactivity resulted from one standard deviation decrease of the EQI air, water, land, social, and built environment subdomains ranged from an increase of 2.6%, 1.5%, 2.9%, 3.3%, and 1.7% to a decrease of 2.9%, 1.4%, 2.4%, 2.4%, and 0.8% across U.S. counties, respectively (
Table 5). Statistically significant inverse associations between the EQI air, water, land, social, and built environment subdomains and physical inactivity rate occupied 6%, 20%, 12%, 29%, and 21% of all county-specific coefficient estimates, whereas statistically significant positive associations accounted for 16%, 14%, 5%, 1%, and 1%, respectively.
Figure 1 and
Figure 2 map the estimated county-specific impact of the overall EQI and its subdomains on the prevalence of leisure-time physical inactivity using GWRs.
Moran’s I of the residuals estimated from the OLS regressions were 0.092 (95% CI = 0.090, 0.093) for the overall EQI and 0.089 (95% CI = 0.088, 0.090) for the five EQI subdomains, denoting substantial geographical variations across U.S. counties that were not explained by these global models. In comparison, Moran’s I of the residuals estimated from the GWRs were reduced to −0.0003 (95% CI = −0.001, 0.001) for the overall EQI and 0.0002 (95% CI = −0.001, 0.001) for the five EQI subdomains. The R-squared increased from 0.58 based on the OLS regression to 0.87 based on the GWR with the overall EQI as the key independent variable. Analogously, the R-squared increased from 0.58 based on the OLS regression to 0.86 based on the GWR with the five EQI subdomains as the key independent variables.
4. Discussion
This study assessed the geographical variations in the environmental determinants of physical inactivity among U.S. adults. Prevalence of leisure-time physical inactivity was matched to the overall EQI and its five subdomains by residential county. GWRs were performed to estimate county-specific associations between environmental quality and physical inactivity rate, adjusting for various county-level characteristics. Substantial geographical variations in the estimated environmental determinants of physical inactivity were revealed.
In general, this study confirmed an inverse relationship between environmental quality and physical inactivity documented in the previous literature [
17,
18,
19]. However, the effect magnitude is quite moderate—one standard deviation decrease in the overall EQI on average led to an increase in county-level prevalence of leisure-time physical inactivity by approximately 1%. However, the negative impact of compromised environmental quality on physical activity was not uniformly distributed across geographical areas but concentrated in about 20% of U.S. counties. Those counties tended to cluster within a few states such as California, Florida, Montana, North Dakota and South Dakota. Among the five environmental subdomains, built and social environment tended to exert the largest impacts. This coincides with the body of evidence that predominantly concentrates upon the influence of built environment and neighborhood social environment on physical activity engagement [
18,
20]. Moreover, these two EQI subdomains shared fairly similar geographical variations in their estimated impact—the counties most influenced by built and social environment tended to reside in a few states such as Minnesota, Montana, Utah, Colorado, and California. Accessibility and quality of drinking and recreational water, as measured by the EQI water subdomain, and to a lesser extent, air and land conditions, were also found to link to physical inactivity. However, the geographical distributions of their respective impact differed substantially from each other and also noticeably diverged from their counterparts of built and social environment.
Given the substantial heterogeneities in the estimated environmental determinants of physical inactivity, customized policy interventions that address specific and most concerning environmental issue in a local area could be more effective (and cost-effective) than a nationwide universal intervention. While evidence-based interventions addressing various environmental attributes have provided a rich set of policy options, local government and stakeholders should carefully assess their specific situation and choose the option that best meet their individual needs [
21]. For instance, in an area where air pollution is the major deterrent to physical activity, pollution control policies should be prioritized over park and bike lane construction in order to promote a more active lifestyle. Moreover, given limited resources, local government should prioritize their policy options based on a cost-benefit calculation. While some of the environmental determinants are more difficult and/or expensive to change (e.g., poverty, landscaping), others are relatively less costly and/or resource consuming (e.g., air and water quality monitoring). Government may invest more on environmental interventions that provide a larger marginal return per dollar spent.
GWR has been increasingly widely used to examine spatially varying patterns and relationships in both chronic conditions (e.g., diabetes) and communicable diseases (e.g., malaria) [
16,
22]. The major advantage of GWR over a global model such as OLS is that it explicitly models spatial autocorrelation and produces location-specific estimates (as well as their respective uncertainty measures, e.g., standard error, CI, and
p-value). Spatial dependency is a testable hypothesis (e.g., using Moran’s I), and in absence of it, GWR is reduced to an OLS regression. In this study, the Moran’s I of the residentials were substantially reduced and became statistically non-significant when replacing OLS with GWR, suggesting that most, if not all, the spatial heterogeneity in the OLS-estimated environmental determinants of physical inactivity could be accounted for by GWR. In addition, the substantial increase in the R-squared also reflects the improved goodness of fit of GWR relative to OLS.
This study is the first that examined geographical variations in the environmental determinants of physical inactivity among U.S. adults. Measures on environmental quality were comprehensive and constructed based on a large pool of authoritative data sources. GWR illustrated spatially varying relationship between environmental attributes and physical inactivity that could not be revealed using conventional regression approaches. Nevertheless, a few limitations of the study warrant caution. Leisure-time physical activity in the BRFSS was based on self-report and prone to recall problem and social desirability bias [
23]. The sampling design of BRFSS enables state-representative health indicator estimates, but the representativeness in general does not extend to local estimates below the state level (e.g., county or city). More recent EQI and its subdomains based on 2010 Census data are under construction but have yet been available, which prevented us from using more recent BRFSS surveys. Arguably, 12 years have passed since the EQI and physical inactivity data were collected, and the estimated county-specific relationships between EQI and physical inactivity might have changed over time. However, in the absence of up-to-date data, no reliable projections could be made regarding the long-term trajectory for the local variations of physical inactivity in relation to environmental quality. The EQI measures are comprehensive but abstract, so that we are unable to differentiate the influence of each specific contributing factor (e.g., park versus bike lane, crime versus housing). This limits our ability to provide more specific county-level policy recommendations. Upon the availability of more recent physical inactivity and EQI data, future studies may adopt a longitudinal study design and examine the change of physical inactivity patterns in response to the change of environmental quality across U.S. counties.