DanSumT5: Automatic Abstractive Summarization for Danish

Sara Kolding, Katrine Nymann, Ida Bang Hansen, Kenneth Enevoldsen, Ross Deans Kristensen-McLachlan*

*Corresponding author af dette arbejde

Publikation: Bidrag til bog/antologi/rapport/proceedingKonferencebidrag i proceedingsForskningpeer review

1 Citationer (Scopus)

Abstract

Automatic abstractive text summarization is a challenging task in the field of natural language processing. This paper presents a model for domain-specific summarization for Danish news articles. DanSumT5 is an mT5 model fine-tuned on a cleaned subset of the DaNewsroom dataset comprising abstractive article-summary pairs. The resulting state-of-the-art model is evaluated both quantitatively and qualitatively, using ROUGE and BERTScore metrics, along with human rankings of the summaries. We find that although model refinements increase quantitative and qualitative performance, the model is still prone to factual errors. We discuss the limitations of current evaluation methods for automatic abstractive summarization and underline the need for improved metrics and transparency within the field. We suggest that future work should employ techniques for detecting and reducing errors in model output and methods for reference-less evaluation of summaries.

OriginalsprogEngelsk
TitelProceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
RedaktørerTanel Alumäe, Mark Fishel
Antal sider17
UdgivelsesstedTartu
ForlagTartu University Press
Publikationsdato2023
Sider248-264
ISBN (Trykt)978-99-1621-999-7
ISBN (Elektronisk)9789916219997
StatusUdgivet - 2023
Begivenhed24th Nordic Conference on Computational Linguistics, NoDaLiDa 2023 - Torshavn, Færøerne
Varighed: 22 maj 202324 maj 2023

Konference

Konference24th Nordic Conference on Computational Linguistics, NoDaLiDa 2023
Land/OmrådeFærøerne
ByTorshavn
Periode22/05/202324/05/2023
NavnNEALT (Northern European Association of Language Technology) Proceedings Series
Vol/bind52
ISSN1736-6305

Fingeraftryk

Dyk ned i forskningsemnerne om 'DanSumT5: Automatic Abstractive Summarization for Danish'. Sammen danner de et unikt fingeraftryk.

Citationsformater