Automatic proficiency scoring for early-stage writing

Michael Riis Andersen, Kristine Kabel, Jesper Bremholm, Jeppe Bundsgaard, Lars Kai Hansen

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

8 Downloads (Pure)


In this work, we study the feasibility of using machine learning and natural language processing methods for assessing writing proficiency in Danish with respect to text construction, sentence construction, and use of modifiers. Our work is based on the analytical framework for scoring early writing proposed by Kabel et al. (2022), where each text is first annotated by a human expert according to a predefined coding scheme and subsequently scored using statistical Rasch modelling (Rasch, 1960). We investigate two different strategies for estimating these scores automatically: 1) we propose a system for identifying the central linguistic features automatically mimicking the role of the human experts and 2) we train state-of-the-art discriminative machine learning models to predict the proficiency scores directly from the texts. We conduct a number of experiments to evaluate and compare the two approaches. Our results show strong and statistically significant correlations between the scores generated using the automatic system and scores based on human experts. We also estimate and report the reliability of the individual linguistic features in the automatic annotation system. Finally, we also propose and evaluate an extension of the statistical model, which allows the model to compensate for potential systematic errors in the automatic annotations. The article thereby contributes to the area of automated essay scoring (AES) and shows that it is possible to provide teachers with automated valid and reliable knowledge about the development of their students' writing competences, which they can use in their feedback to students.
Original languageEnglish
Article number100168
JournalComputers and Education: Artificial Intelligence
Number of pages10
Publication statusPublished - Oct 2023


  • Automatic scoring
  • Danish
  • Early-stage literacy
  • Low-resource languages
  • Machine learning
  • Natural language processing
  • Rasch models
  • Writing proficiency

Cite this