Natural Language Processing (NLP) tools are essential for carrying out cutting-edge research in social sciences and humanities that involve large scale spoken language analysis. Speech processing (SP) tools are especially useful when working with dialogue data, as they can for instance be used for diarization and to generate transcriptions of spoken language data in an unsupervised manner. SP tools have successfully been developed and implemented for a number of languages, such as English, which have significantly improved the possibility to do efficient research in spoken English and have reduced the transcription costs for many research projects.However, the existing SP tools for Danish are often inefficient and need improvement: for instance, tools like DanFA (Young & McGarrah, 2017), which is able to align speech to orthographic transcription at the level of individual sounds, do not work well with unclear pronunciations and in noisy situations. In general, the existing tools, while efficient in processing multi-word utterances in monologues recorded in high quality, are inefficient when processing natural language in real-world dialogue situations. The latter has been shown to be especially challenging to process because of overlapping speech and noisy data, often recorded in natural settings, that do not allow high quality sound. This has left a gap in the research field and hampered research in understanding the mechanisms that are at play during interactions between human beings.Developing SP tools for Danish is particularly problematic due to the unusually opaque sound structure of the language (Trecca et al., in prep.), which includes a large number of vocalic sounds, the weakening of consonants, the pervasive assimilation of schwa vowels, and the reduction of word endings. As a result, interaction research in Danish is often more expensive and time-consuming. Therefore, it is necessary to improve the existing SP tools for Danish to further research in the acquisition and use of Danish, with significant reductions in the time and money that are currently spent on manual transcription.