The Danish Gigaword Project

Publikation: Working paperForskning

Dokumenter

Links

  • Leon Strømberg-Derczynski, IT University of Copenhagen
  • ,
  • Rebekah Baglini
  • Morten H. Christiansen
  • Manuel R. Ciosici
  • Jacob Aarup Dalsgaard, Aarhus Universitet
  • ,
  • Riccardo Fusaroli
  • Peter Juel Henrichsen
  • ,
  • Rasmus Hvingelby
  • ,
  • Andreas Kirkedal, IT University of Copenhagen
  • ,
  • Alex Speed Kjeldsen, University of Copenhagen
  • ,
  • Claus Ladefoged, TV2
  • ,
  • Finn Årup Nielsen, Danmarks Tekniske Universitet
  • ,
  • Malte Lau Petersen
  • ,
  • Jonathan Hvithamar Rystrøm, Aarhus Universitet
  • ,
  • Daniel Varab, IT University of Copenhagen, Novo Nordisk
Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.
OriginalsprogEngelsk
UdgiverArXiv
Antal sider6
StatusUdgivet - maj 2020

    Forskningsområder

  • cs.CL

Se relationer på Aarhus Universitet Citationsformater

Projekter

ID: 189995107