DanSumT5: Automatic Abstractive Summarization for Danish

Sara Kolding, Katrine Nymann, Ida Bang Hansen, Kenneth Enevoldsen, Ross Deans Kristensen-McLachlan*

*Corresponding author for this work

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

Abstract

Automatic abstractive text summarization is a challenging task in the field of natural language processing. This paper presents a model for domain-specific sum marization for Danish news articles, Dan SumT5; an mT5 model fine-tuned on a cleaned subset of the DaNewsroom dataset consisting of abstractive summary-article pairs. The resulting state-of-the-art model is evaluated both quantitatively and qualitatively, using ROUGE and BERTScore metrics and human rankings of the summaries. We find that although model refinements increase quantitative and qualitative performance, the model is still prone to factual errors. We discuss the limitations of current evaluation methods for automatic abstractive summarization and underline the need for improved metrics and transparency within the field. We suggest that future work should employ methods for detecting and reducing errors in model output and methods for referenceless evaluation of summaries.
Original languageEnglish
Title of host publicationProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
EditorsTanel Alumäe, Mark Fishel
Place of publicationTartu
PublisherTartu University Press
Publication date2023
Pages248-264
ISBN (Print)978-99-1621-999-7
Publication statusPublished - 2023
SeriesNEALT (Northern European Association of Language Technology) Proceedings Series
Volume52
ISSN1736-6305

Fingerprint

Dive into the research topics of 'DanSumT5: Automatic Abstractive Summarization for Danish'. Together they form a unique fingerprint.

Cite this