DKIE: Open Source Information Extraction for Danish

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

  • Leon Derczynski, University Of Sheffield, United Kingdom
  • Camilla Vilhelmsen Field, University of Southern Denmark, Denmark
  • Kenneth Sejdenfaden Bøgh, Denmark
Danish is a major Scandinavian language spoken daily by around six million people. However, it lacks a unified, open set of NLP tools. This demonstration will introduce DKIE, an extensible open-source toolkit for processing Danish text. We implement an information extraction architecture for Danish within GATE, including integrated third-party tools. This implementation includes the creation of a substantial set of corpus annotations for dataintensive named entity recognition. The final application and dataset is made are openly available, and the part-of-speech tagger and NER model also operate independently or with the Stanford NLP toolkit.
Original languageEnglish
Title of host publicationProceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics
EditorsShuly Wintner, Marko Tadia, Bogdan Babych
Number of pages4
PublisherAssociation for Computational Linguistics
Publication year2014
Pages61-64
Publication statusPublished - 2014
EventConference of the European Chapter of the Association for Computational Linguistics - Gothenburg, Sweden
Duration: 26 Apr 201430 Apr 2014
Conference number: 14

Conference

ConferenceConference of the European Chapter of the Association for Computational Linguistics
Nummer14
LandSweden
ByGothenburg
Periode26/04/201430/04/2014

See relations at Aarhus University Citationformats

ID: 86973231