The Danish Gigaword Project

Publikation: Working paper/Preprint Working paperForskning

127 Downloads (Pure)

Abstract

Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language.
OriginalsprogEngelsk
UdgiverArXiv
Antal sider6
StatusUdgivet - maj 2020

Emneord

  • cs.CL

Fingeraftryk

Dyk ned i forskningsemnerne om 'The Danish Gigaword Project'. Sammen danner de et unikt fingeraftryk.

Citationsformater