A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information

Gongbo Chen, Shanshan Li, Luke D. Knibbs, N. A.S. Hamm, Wei Cao, Tiantian Li, Jianping Guo, Hongyan Ren, Michael J. Abramson, Yuming Guo

Research output: Journal PublicationArticlepeer-review

412 Citations (Scopus)
157 Downloads (Pure)


Background: Machine learning algorithms have very high predictive ability. However, no study has used machine learning to estimate historical concentrations of PM2.5 (particulate matter with aerodynamic diameter ≤ 2.5 μm) at daily time scale in China at a national level. Objectives: To estimate daily concentrations of PM2.5 across China during 2005–2016. Methods: Daily ground-level PM2.5 data were obtained from 1479 stations across China during 2014–2016. Data on aerosol optical depth (AOD), meteorological conditions and other predictors were downloaded. A random forests model (non-parametric machine learning algorithms) and two traditional regression models were developed to estimate ground-level PM2.5 concentrations. The best-fit model was then utilized to estimate the daily concentrations of PM2.5 across China with a resolution of 0.1° (≈10 km) during 2005–2016. Results: The daily random forests model showed much higher predictive accuracy than the other two traditional regression models, explaining the majority of spatial variability in daily PM2.5 [10-fold cross-validation (CV) R2 = 83%, root mean squared prediction error (RMSE) = 28.1 μg/m3]. At the monthly and annual time-scale, the explained variability of average PM2.5 increased up to 86% (RMSE = 10.7 μg/m3 and 6.9 μg/m3, respectively). Conclusions: Taking advantage of a novel application of modeling framework and the most recent ground-level PM2.5 observations, the machine learning method showed higher predictive ability than previous studies. Capsule: Random forests approach can be used to estimate historical exposure to PM2.5 in China with high accuracy.

Original languageEnglish
Pages (from-to)52-60
Number of pages9
JournalScience of the Total Environment
Publication statusPublished - 15 Sept 2018


  • Aerosol optical depth
  • China
  • Machine learning
  • PM
  • Random forests

ASJC Scopus subject areas

  • Environmental Engineering
  • Environmental Chemistry
  • Waste Management and Disposal
  • Pollution


Dive into the research topics of 'A machine learning method to estimate PM2.5 concentrations across China with remote sensing, meteorological and land use information'. Together they form a unique fingerprint.

Cite this