Next Article in Journal
An Approach of Identifying and Extracting Urban Commercial Areas Using the Nighttime Lights Satellite Imagery
Next Article in Special Issue
Estimation of Hourly near Surface Air Temperature Across Israel Using an Ensemble Model
Previous Article in Journal
Changes in Atmospheric, Meteorological, and Ocean Parameters Associated with the 12 January 2020 Taal Volcanic Eruption
Previous Article in Special Issue
Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods
 
 
Article
Peer-Review Record

Satellite-Derived PM2.5 Composition and Its Differential Effect on Children’s Lung Function

Remote Sens. 2020, 12(6), 1028; https://doi.org/10.3390/rs12061028
by Khang Chau *,†, Meredith Franklin † and W. James Gauderman
Reviewer 1:
Reviewer 2: Anonymous
Remote Sens. 2020, 12(6), 1028; https://doi.org/10.3390/rs12061028
Submission received: 26 February 2020 / Revised: 19 March 2020 / Accepted: 19 March 2020 / Published: 23 March 2020

Round 1

Reviewer 1 Report

This study developed spatio-temporal models for ambient air pollutants, specifically PM2.5 and its species (SO4, NO3, EC, and dust) using satellite-derived predictors and studied the association of air pollutants with children’s lung function in California. The study in general is well conducted and the manuscript is well written. I found this work very interesting and enjoyed reading it. I have the following comments before its publication:

  • The authors stated that "kriging, smoothing, and LUR ... yield severe biases and underestimation of standard errors in the health effects studies." Could you please clarify why or how? Even in your study, it seems that there is some residual yet to be explained and the performance has not been compared with other above-mentioned methods.
  • The section 2.1 on measurements need to be improved in presentation. Have you used the daily mean or annual means? It is not clear to me which measure of pollutants is used. It is also not clear whether have you used all ground measured data of CSN and EPA or the entire measurements from other sources as well, e.g. through IMPROVE, etc?
  • I was expecting a stronger correlation between central-site PM2.5 and EC. Is this at cohort baseline or whole years?
  • Why the dust is limited only to aluminum, calcium, iron, silicon, and titanium? Are these explaining the dust over all years of your study? Thus, these have not changed over time?
  • It is not clear whether have you used the 74 AOD mixtures of MISR or not?
  • Is there any reason why AOD products were typically twice as abundant as AOD mixtures in coverage over California compared to Mongolia?
  • The Figure 1 shows point location of children. Is this allowed to disclose such data to Google by the IRB? Please provide in the text that the study has been approved by appropriate ethics committee. What are the point colors referring to?
  • How many subjects were in your study for recent cohort? Was that also 11,000?
  • It is not clear how the hyperparameters were tuned for each learner. How did you ensure that that the models are not overfitted?
  • Is it possible to add variable importance (Figure A4) for the best learners for each pollutant to the main text?
  • What was the spatial R2, temporal R2 and overall R2 of each learner? Is the R2 in Table 1 overall R2?
  • How height and race could be confounders? Assuming that they are associated with lung function, are they also associated with exposures? If possible, I suggest to create a DAG and identify the confounding variables.
  • Which cut-off for VIF was considered to judge collinearity?
  • What is your interpretation that NO3 is associated with FVC but not with FEV1? Also why PM2.5 is associated with FEV1 but not with FVC?

Author Response

We thank the reviewer for their positive feedback as well as the insightful comments and questions. We have provided our responses below:

 

This study developed spatio-temporal models for ambient air pollutants, specifically PM2.5 and its species (SO4, NO3, EC, and dust) using satellite-derived predictors and studied the association of air pollutants with children’s lung function in California. The study in general is well conducted and the manuscript is well written. I found this work very interesting and enjoyed reading it. I have the following comments before its publication:

 

The authors stated that "kriging, smoothing, and LUR ... yield severe biases and underestimation of standard errors in the health effects studies." Could you please clarify why or how? Even in your study, it seems that there is some residual yet to be explained and the performance has not been compared with other above-mentioned methods.

 

We refer to Alexeef et al (2014) when stating that estimates from kriging, smoothing, LUR can lead to severe biases and underestimation of standard errors in health effects studies. In their paper they compared kriging and LUR estimates of PM2.5 in terms of how they affect the standard errors in acute and chronic health effects studies. For example, kriging of daily data resulted in predicted exposure R2 of 0.25 to 0.36. Using these exposures in chronic health effects there was bias in the health effects of 4% (upward bias) to 15% (downward bias). The standard errors of the health effects estimates under these exposures were very small (24% to 45% where it should be 95%). This indicates significant bias and severe underestimation of the variance in the health effects estimates. For their LUR models, which included AOD, the R2 for the exposure models was 0.71 to 0.84. The subsequent (upward) bias in chronic health effects with these exposures was 1% to 5%, much smaller, and the coverage was 48-68%, still a severe underestimate.

 

Our exposure models have R2 of 0.53 (dust) to 0.71 (sulfate and nitrate), so would be comparable to the LUR models of Alexeef et al (2014). Nevertheless, we agree that there is unexplained residual variance in our exposure models which could lead to biases and underestimation of standard errors in the epidemiological assessment. We have added this limitation to the discussion as follows:

Nevertheless, our exposure prediction models are not without unexplained residual variance; our best models had CV R$^2$ from 0.53 (dust) to 0.71 (sulfate, nitrate). As noted by \citet{Alexeeff2015}, there can be 1-5\% upward bias in subsequent health effects estimates when exposure predictions have performance statistics in the range we observed, and their standard errors may be underestimated. It is difficult to mitigate these issues due to imperfect exposure models, but it is worth keeping in mind while interpreting our epidemiological results.”(Lines 279-285 of the revised manuscript)

 

We have also clarified the sentence in the introduction:

While these approaches are valuable for generating exposures with greater spatial coverage, it has been shown that if their prediction performance is poor, subsequent epidemiological studies can yield severe biases and underestimation of standard errors in the health effects estimates \cite{Alexeeff2015}.” (Lines 38-40 of the revised manuscript)

 

Alexeeff, S. E., Schwartz, J., Kloog, I., Chudnovsky, A., Koutrakis, P., & Coull, B. a. (2014). Consequences of kriging and land use regression for PM2.5 predictions in epidemiologic analyses: insights into spatial variability using high-resolution satellite data. Journal of Exposure Science & Environmental Epidemiology, October 2013, 1–7. https://doi.org/10.1038/jes.2014.40

 

 

The section 2.1 on measurements need to be improved in presentation. Have you used the daily mean or annual means? It is not clear to me which measure of pollutants is used. It is also not clear whether have you used all ground measured data of CSN and EPA or the entire measurements from other sources as well, e.g. through IMPROVE, etc?

 

We improved the wording of this section to clarify the data that we used. We used daily means or 24-hour integrated samples of PM2.5. All speciation data are daily. We did not use data from other sources such as IMPROVE or CARB. Finally, we refer to the websites where we downloaded the EPA data in the references. (Lines 74-86 of the revised manuscript)

 

I was expecting a stronger correlation between central-site PM2.5 and EC. Is this at cohort baseline or whole years?

 

The reported correlations in Table A1 were the correlation between annual means of the pollutants among the subjects only during the final follow-up period (2011–2012). There are only 8 unique central site values per year and the spatial variability in MISR derived EC is likely driving the poor correlation with PM2.5. In addition, the central-site PM2.5 was calculated for a fixed period of 12 months for all subjects, while the MISR-derived pollutants were estimated for the 12 months prior to each participant’s assessment visit.

 

We have updated Table A1 to show the Spearman correlations for 2007–2012, covering the 3 follow-up periods (same as the maps, although the epidemiological models only concerned 2011-2012). (See Table A1 in the revised manuscript.)

 

Why the dust is limited only to aluminum, calcium, iron, silicon, and titanium? Are these explaining the dust over all years of your study? Thus, these have not changed over time?

 

We use the term dust for “geologic materials” or “fugitive soil dust” as defined by Chow et al 2015. In their earlier source apportionment study of the California Central valley Chow et al (2003) noted that fugitive soil dust in PM10 was heavily comprised of these elements. They also noted that the composition did not change spatially between these sites in the central valley, or over two time periods of measurement (1987 and 1997).

 

The equation shown in our manuscript is from Chow et al. 2015, which is a review paper, and has been used by others. We use this equation over all years of the study, and while it may be better applied to PM10 we only use it for PM2.5 as that is the size fraction available through the EPA’s chemical speciation network.

 

We clarified the text with the following additional sentence:
This definition of dust pertains to fugitive geological materials, and has been shown to have stable compositional source profile over time in California \cite{Chow2003}.” (Lines 89-90 of the revised manuscript)

 

Chow, J. C., Watson, J. G., Ashbaugh, L. L., & Magliano, K. L. (2003). Similarities and differences in PM10 chemical source profiles for geological dust from the San Joaquin Valley, California. Atmospheric Environment, 37(9–10), 1317–1340. https://doi.org/10.1016/S1352-2310(02)01021-X

 

Chow, J. C., Lowenthal, D. H., Chen, L. W. A., Wang, X., & Watson, J. G. (2015). Mass reconstruction methods for PM2.5: a review. Air Quality, Atmosphere and Health, 8(3), 243–263. https://doi.org/10.1007/s11869-015-0338-3

 

 

It is not clear whether have you used the 74 AOD mixtures of MISR or not?

 

All 74 AOD mixtures were used in the fitting process. We clarified the text to better explain that we fit models separately on the products and the mixtures. Please also refer to Appendix Figure A3 to see all model results:

 

Five machine learning methods were considered: Ridge regression, Least Absolute Shrinkage and Selection Operator (LASSO), Gradient Boosting (GBM), Random Forests (RF), and Support Vector Machines (SVM), all within a regression setting \cite{James2013}. Inputs to the models were meteorology and either the MISR products or the 74 MISR mixtures. The optimal model for each pollutant was chosen based on its test $R^2$ as the primary metric and its test RMSE as the secondary metric. We further supplemented the model with geospatial (coordinates of MISR pixels projected to UTM zone 11) and temporal (Julian date and month) predictors. The best predicting model for each pollutant was trained on the full dataset prior to estimating exposures for the epidemiological assessment.” (Lines 145-153 of the revised manuscript)

 

Is there any reason why AOD products were typically twice as abundant as AOD mixtures in coverage over California compared to Mongolia?

 

Yes there is a reason for this, which relates to the MISR retrieval algorithm. Mixtures do not have the same strict level of cloud or surface brightness screening as do the main AOD products, so in Mongolia where the retrieval is nearly always contaminated with cloud or snow in the wintertime there were more mixtures retrieved.

 

At the same time, for a successful mixture retrieval all 74 mixtures must be complete for them to be reported in the dataset (meaning the 8 MISR particle types from which the 74 mixtures are derived must be simultaneously retrieved successfully). In California there are far fewer retrieval issues due to cloud and snow and therefore the main AOD products are more abundantly retrieved than the 74 mixtures.

 

As the following paragraph of Section 2.2 of the text is unnecessary to the content of the current manuscript, we have removed it:

MISR AOD products and AOD mixtures unfortunately do not have the same retrieval success rate. In a previous study of MISR AOD over Mongolia \cite{Franklin2018}, we observed more pixels with complete AOD mixtures data than those with complete AOD products data, while in this study, AOD products were typically twice as abundant as AOD mixtures in coverage over California. For our analyses, we only matched air monitoring sites to MISR pixels with at least AOD products or AOD mixtures data complete.

 

The following sentence in Section 2.5 has also been removed for similarly not providing useful information for the current manuscript:

Over Mongolia, we were able to distinguish MISR AOD mixtures that contributed to SO$_2$ and total PM mass and, using machine learning methods, generated reliable predictions over the Ulaanbaatar metropolitan area \cite{Franklin2018}. We took a similar approach here in predicting PM$_{2.5}$, \sulfatens, \nitratens, EC, and dust using machine learning methods.

 

 

The Figure 1 shows point location of children. Is this allowed to disclose such data to Google by the IRB? Please provide in the text that the study has been approved by appropriate ethics committee. What are the point colors referring to?

 

Figure 1 was created using the ggmap package in R, which downloaded the base maps of interest locally using the Google Maps API. We then added the subject location to the base map images. No subject data were transferred online or otherwise disclosed to Google in the process. The point colors correspond to the community that the children were recruited from and are denoted in the legend at the bottom of the figure.

 

We have included a statement about how the study was approved by the USC Institutional Review Board (IRB):

Study protocols were approved by the Institutional Review Board at the University of Southern California (USC), and additional details of CHS community and subject selection have been previously reported \cite{Peters1999,McConnell2006}.” (Lines 127-129 of the revised manuscript)

 

How many subjects were in your study for recent cohort? Was that also 11,000?

 

No, 11,000 is the total for the entire CHS study over all cohort years. For the cohort we examined there were approximately 3,000 subjects starting in 2003, and approximately 1,200 in cross-section that we examine (2011-2012) for whom lung function was measured. We clarified that approximately 3,000 children were enrolled in the last cohort:

In this study, we focused on the most recent cohort that began in 2003, enrolling approximately 3,000 children at age 6--7 years, and followed until 2012 when they were 15--16 years old.” (Lines 114-117 of the revised manuscript)

 

It is not clear how the hyperparameters were tuned for each learner. How did you ensure that that the models are not overfitted?

 

The hyperparameters were tuned using 5-fold cross-validation with mean out-of-sample (OOS) R2 and RMSE as the main indicators. For example, in gradient boosting models, we performed a grid search on number of trees K, from 500 to 5000 at 500 intervals. Overfitted iterations would typically have poorer mean OOS R2/RMSE as number of trees increase. If the best mean OOS R2/RMSE were at K=5000, we would attempt higher Ks as the optimal value might not have been found. We took the hyperparameter values from the iteration with the best mean OOS R2/RMSE and predicted on a ‘test’ sample that the model had not seen and assessed test R2 and RMSE to compare learners.

 

Is it possible to add variable importance (Figure A4) for the best learners for each pollutant to the main text?

 

We have moved the variable importance figures to the main text, now as Figures 3 and 4 of the revised manuscript. We have also added text to the Results section 3.1 text on the variable importance:

The most important variables for PM$_{2.5}$ include an interpretable mix of of AOD, small and medium AOD, as well as meteorological variables (surface shortwave radiation, wind speed and temperature) (Figure \ref{fig:pm25-featimp}). Similar variables were important for \sulfate and \nitratens, but non-spherical AOD played a larger role in both (Figure \ref{fig:csn-featimp}). Interestingly AODs only ranked 8th and 9th for EC, with meteorology and temporal indicators playing a larger role in its prediction. Finally, dust was predicted only by AOD mixtures relating primarily to dust (mixtures 70 and 53) and non-absorbing (mixtures 4, 12, 13, 19, 21) particles.” Please see lines 196-202 of the revised manuscript.

 

What was the spatial R2, temporal R2 and overall R2 of each learner? Is the R2 in Table 1 overall R2?

 

The R2 reported in Table 1 is the best test R2 achieved by each ML method for each pollutant in the model tuning process. (Test R2 for each method is the R2 assessed on the 30% of the sample which was set aside during model tuning.) For PM2.5 mass, the reported R2 is the test R2 achieved by the gradient boosting model, which outperformed other learners. For PM2.5 speciation pollutants, as their sample sizes were smaller, we trained each learner 20 times to assess performance stability, and the reported R2 is the highest test R2 achieved by the respective learner, not the overall R2.

 

With almost 160 PM2.5 sites in California over the 19-year period, leave-one-out model tuning to assess spatial R2 was too time consuming due to the large sample size, and data availability varies greatly by site as sites were added or discontinued over time.

 

How height and race could be confounders? Assuming that they are associated with lung function, are they also associated with exposures? If possible, I suggest to create a DAG and identify the confounding variables.

 

Only SO4 and NO3 were statistically significantly associated with height (p = 0.004 and 0.001, respectively); although the magnitudes of the correlation were fairly small (r = -0.08 and -0.09, respectively). Therefore, height is a confounder for the relationship between these pollutants and FEV1 and FVC.

 

Race is also significantly associated with all exposures except central-site PM2.5 ( < 0.001 on Kruskal-Wallis test for all MISR-derived exposures). However, the distribution of race groups is significantly associated with the study communities (p < 0.001 on Chi-square test of independence). For example, there were more Hispanic children in Anaheim, Mira Loma, and Santa Barbara and more white children in Glendora, San Dimas, and Upland. As the exposure pollutants are associated with study communities as part of the study design, it could be argued that community is a confounder in the relationship between race and the pollutants. Therefore, assessing the confounding effect of race is less straightforward in our study.

 

Which cut-off for VIF was considered to judge collinearity?

 

We used the cutoff of GVIF ≤ 10, although in our final models, all GVIF ≤ 3. We have added the following to Section 2.6:

Generalized variance inflation factor was calculated to assess potential collinearity among the predictors (using GVIF $\le 10$ as the cutoff).

 

and the following to the Results:

Collinearity was not an issue in any of the single- or multi-pollutant models with two pollutants (all GVIF $\le 3$).

(Lines 171-173 and line 225-226 of the revised manuscript, respectively)

 

What is your interpretation that NO3 is associated with FVC but not with FEV1? Also why PM2.5 is associated with FEV1 but not with FVC?

 

It is not unusual in epidemiological assessments to have mixed results in terms of statistical significance. We note that all of our effect estimates for FEV and FVC are in the negative direction, and that we are determining association only if they reach the strict threshold of a p-value less than or equal to 0.05.

 

With that in mind, although PM2.5 is significantly associated with FEV1 but not FVC, the effect size is similar between the two (-131 vs. -122, respectively) and the p-value of 0.103 indicates a marginal association, which is also reflected in the 95% confidence interval (-260, 25). We see that the effect estimate is very similar, but that it did not meet our criteria of α ≤ 0.05. A previous study of the CHS, albeit longitudinal (Gauderman et al. 2004), found a similar phenomenon where the FEV1 was statistically significant (at α ≤ 0.05) with central site PM2.5 [-79.7 (-153.0 - -6.4)] but FVC was marginally significant [-60.1 (-166.1 – 45.9)].

 

As for NO3 we see decreased lung function with both pulmonary function tests, but the FEV1 estimate has a large p-value of 0.45. There have not been many other similar studies examining nitrate and lung function, so it is difficult to suggest exactly what might be going on. In a recent paper, Bose et al. (2018) found mixed results in their examination of prenatal nitrate exposure and lung function where Bayesian Distributed Lag Models suggested significant exposure windows of susceptibility, but the associations between NO3 and FEV and FVC did not reach statistical significance at the α ≤ 0.05 level (although both had negative effect estimates).

 

Gauderman, W. J., Avol, E., Gilliland, F., Vora, H., Thomas, D., Berhane, K., McConnell, R., Kuenzli, N., Lurmann, F., Rappaport, E., Margolis, H., Bates, D., & Peters, J. (2004). The effect of air pollution on lung development from 10 to 18 years of age. NEW ENGLAND JOURNAL OF MEDICINE.

 

Bose, S., Rosa, M. J., Mathilda Chiu, Y. H., Leon Hsu, H. H., Di, Q., Lee, A., Kloog, I., Wilson, A., Schwartz, J., Wright, R. O., Morgan, W. J., Coull, B. A., & Wright, R. J. (2018). Prenatal nitrate air pollution exposure and reduced child lung function: Timing and fetal sex effects. Environmental Research, 167(February), 591–597. https://doi.org/10.1016/j.envres.2018.08.019

Author Response File: Author Response.pdf

Reviewer 2 Report

This study looks into the relationship between PM2.5, AOD, and health. Though there have been strides in reducing the uncertainty when relating AOD to PM2.5, there is still much work to be done. This coupled with the limitations of using machine learning based methods to relate pollution to negative health effects is also a tedious task. The authors took great care in explaining the model simulation methods and were able to support their results with robust statistics. They did find a significant relationship between sulfates and nitrates and decreased lung capacity which is in line with other studies on air quality impacts on cardiopulmonary health.

My only concerns are: resolution concerns of MISR vs. the ground-based monitors and how representative are each with respect to the population density of the Southern California cities; use of AOD vs. AOD mixtures and the mixed results discussed in the Conclusion section; and poor prediction performance which seems to be an issue with any machine learning technique due to training data issues. In the future, these issues must be overcome if machine learning is to be a viable technique.

Aside from a few grammar issues the study appears to be sound. Please re-read and make sure to check for subject-verb agreement.

Author Response

We thank the reviewer for their positive feedback and thoughtful concern. We have provided our responses below:

 

This study looks into the relationship between PM2.5, AOD, and health. Though there have been strides in reducing the uncertainty when relating AOD to PM2.5, there is still much work to be done. This coupled with the limitations of using machine learning based methods to relate pollution to negative health effects is also a tedious task. The authors took great care in explaining the model simulation methods and were able to support their results with robust statistics. They did find a significant relationship between sulfates and nitrates and decreased lung capacity which is in line with other studies on air quality impacts on cardiopulmonary health.

 

My only concerns are: resolution concerns of MISR vs. the ground-based monitors and how representative are each with respect to the population density of the Southern California cities;

 

We too appreciate this concern as the ground monitors are not randomly placed. However, calibrating to these monitors and using MISR AOD, which has far better spatial coverage, should improve upon any possible misrepresentation of the monitoring with respect to the population density of Southern California cities.

We have made a note of this limitation in the discussion as follows:

One limitation in our PM$_{2.5}$ speciation prediction models is the scarcity of data. As the number of CSN sites in California increased from 3 in 2000 to 19 in 2013 and decreased to 16 in 2018 (California PM$_{2.5}$ mass sites increased from 95 in 2000 to 157 in 2018), spatial coverage was certainly restricted (Figure \ref{fig:epa-sites}). Furthermore, the locations of these sparsely available monitors are not necessarily representative of the population density of Southern California.” (Lines 258-262 of the revised manuscript)

 

use of AOD vs. AOD mixtures and the mixed results discussed in the Conclusion section;

 

We clarified the conclusion section to indicate that we used both aerosol AOD and its properties with the following text:

We have shown in this study that MISR AOD observations distinguishing size, shape, absorption and mixture properties can aid in predicting PM$_{2.5}$ and its chemical speciation including \sulfatens, \nitratens, EC, and dust, particularly when supplemented with spatiotemporal information and high resolution meteorological data.” (Lines 315-318 in the revised manuscript)

 

and poor prediction performance which seems to be an issue with any machine learning technique due to training data issues.

 

Prediction performance issue is in fact better machine learning. We conducted a secondary analysis using regression models with flexible smooth functions (generalized additive models, GAMs) as have been used in several previous studies including those integrating MISR and ground monitoring (e.g. Franklin et al 2017, Meng et al 2018). With similar predictor variables, GAMs performed poorer than the best machine learning model (GBM) with test R2 = 0.41 and test RMSE = 6.85 ug/m3.

 

In the future, these issues must be overcome if machine learning is to be a viable technique.

 

We agree, however we do believe machine learning is an improvement over more traditional regression modeling techniques (see above), particularly with highly correlated predictor variables as we have with MISR products and mixture AODs. Machine learning methods have also proven their capability in handling high-dimensional data (such as the 74 AOD mixtures in our study).

 

We have added a sentence to the discussion addressing that our models are not perfect and that there is still room for improvement:

Nevertheless, our exposure prediction models are not without unexplained residual variance; our best models had CV R$^2$ from 0.53 (dust) to 0.71 (sulfate, nitrate). As noted by \citet{Alexeeff2015}, there can be 1-5\% upward bias in subsequent health effects estimates when exposure predictions have performance statistics in the range we observed, and their standard errors may be underestimated. It is difficult to mitigate these issues due to imperfect exposure models, but it is worth keeping in mind while interpreting our epidemiological results.”  (Lines 279-285 of the revised manuscript)

 

We also added a sentence to the conclusions:

Epidemiological assessments will only be made more viable, particularly as the quality of remote sensing data and estimation models continue to improve and exposure measurement error decreases.” (Lines 324-326 of the revised manuscript)

 

Aside from a few grammar issues the study appears to be sound. Please re-read and make sure to check for subject-verb agreement.

 

Thank you, we have re-read carefully and made necessary grammar adjustments.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

.

Back to TopTop