Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications

Yosef Knattrup, Jakub Kubečka, Daniel Ayoubi, Jonas Elm*

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskrift/Konferencebidrag i tidsskrift /Bidrag til avisTidsskriftartikelForskningpeer review


Formation and growth of atmospheric molecular clusters into aerosol particles impact the global climate and contribute to the high uncertainty in modern climate models. Cluster formation is usually studied using quantum chemical methods, which quickly becomes computationally expensive when system sizes grow. In this work, we present a large database of ∼250k atmospheric relevant cluster structures, which can be applied for developing machine learning (ML) models. The database is used to train the ML model kernel ridge regression (KRR) with the FCHL19 representation. We test the ability of the model to extrapolate from smaller clusters to larger clusters, between different molecules, between equilibrium structures and out-of-equilibrium structures, and the transferability onto systems with new interactions. We show that KRR models can extrapolate to larger sizes and transfer acid and base interactions with mean absolute errors below 1 kcal/mol. We suggest introducing an iterative ML step in configurational sampling processes, which can reduce the computational expense. Such an approach would allow us to study significantly more cluster systems at higher accuracy than previously possible and thereby allow us to cover a much larger part of relevant atmospheric compounds.

TidsskriftACS Omega
Sider (fra-til)25155-25164
Antal sider10
StatusUdgivet - jul. 2023


Dyk ned i forskningsemnerne om 'Clusterome: A Comprehensive Data Set of Atmospheric Molecular Clusters for Machine Learning Applications'. Sammen danner de et unikt fingeraftryk.