The formation of atmospheric molecular clusters is an important stage in forming new particles in the atmosphere. Despite being a highly focused research area, the exact chemical species involved in the initial steps in new particle formation remain elusive. In this Perspective the main challenges and recent progression in the field are outlined with a special emphasis on the chemical complexity of the puzzle and prospect of modeling larger clusters. In general, there is a high demand for accurate and more complete quantum chemical data sets that can be applied in cluster distribution dynamics models and coupled to atmospheric chemical transport models. A view on how the community could reach this goal by applying data-driven machine learning approaches for more efficient exploration of cluster configurations is presented. A path toward larger clusters and direct molecular dynamics simulations of cluster formation and growth using machine learning models is discussed.