Aarhus University Seal

Development of Europe-Wide Models for Particle Elemental Composition Using Supervised Linear Regression and Random Forest

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review



  • Jie Chen, Utrecht University
  • ,
  • Kees De Hoogh, Swiss Tropical and Public Health Institute, University of Basel
  • ,
  • John Gulliver, University of Leicester
  • ,
  • Barbara Hoffmann, Heinrich Heine University Düsseldorf
  • ,
  • Ole Hertel
  • Matthias Ketzel
  • Gudrun Weinmayr, Ulm University
  • ,
  • Mariska Bauwelinck, Vrije Universiteit Brussel
  • ,
  • Aaron Van Donkelaar, Dalhousie University, Washington University St. Louis
  • ,
  • Ulla A. Hvidtfeldt, Danish Cancer Society
  • ,
  • Richard Atkinson, St. George's University of London
  • ,
  • Nicole A.H. Janssen, National Institute for Public Health and the Environment (RIVM)
  • ,
  • Randall V. Martin, Dalhousie University, Danish Cancer Society, Harvard University
  • ,
  • Evangelia Samoli, University of Athens
  • ,
  • Zorana J. Andersen, University of Copenhagen
  • ,
  • Bente M. Oftedal, Norwegian Institute of Public Health
  • ,
  • Massimo Stafoggia, Department of Epidemiology Lazio Regional Health Service, Karolinska Institutet
  • ,
  • Tom Bellander, Karolinska Institutet
  • ,
  • Maciej Strak, Utrecht University, Harvard University
  • ,
  • Kathrin Wolf, Helmholtz Zentrum München - German Research Center for Environmental Health
  • ,
  • Danielle Vienneau, Swiss Tropical and Public Health Institute, University of Basel
  • ,
  • Bert Brunekreef, Utrecht University
  • ,
  • Gerard Hoek, Utrecht University

We developed Europe-wide models of long-term exposure to eight elements (copper, iron, potassium, nickel, sulfur, silicon, vanadium, and zinc) in particulate matter with diameter <2.5 μm (PM2.5) using standardized measurements for one-year periods between October 2008 and April 2011 in 19 study areas across Europe, with supervised linear regression (SLR) and random forest (RF) algorithms. Potential predictor variables were obtained from satellites, chemical transport models, land-use, traffic, and industrial point source databases to represent different sources. Overall model performance across Europe was moderate to good for all elements with hold-out-validation R-squared ranging from 0.41 to 0.90. RF consistently outperformed SLR. Models explained within-area variation much less than the overall variation, with similar performance for RF and SLR. Maps proved a useful additional model evaluation tool. Models differed substantially between elements regarding major predictor variables, broadly reflecting known sources. Agreement between the two algorithm predictions was generally high at the overall European level and varied substantially at the national level. Applying the two models in epidemiological studies could lead to different associations with health. If both between- and within-area exposure variability are exploited, RF may be preferred. If only within-area variability is used, both methods should be interpreted equally.

Original languageEnglish
JournalEnvironmental Science and Technology
Pages (from-to)15698-15709
Number of pages12
Publication statusPublished - 15 Dec 2020

Bibliographical note

Publisher Copyright:
© 2020 American Chemical Society.

Copyright 2021 Elsevier B.V., All rights reserved.

See relations at Aarhus University Citationformats

Download statistics

No data available

ID: 217288699