Class-based Identification of ‘Deviant’ Semantic Features in Historical Corpora

Activity: Talk or presentationLecture and oral contribution

See relations at Aarhus University

Kristoffer Laigaard Nielbo - Lecturer

In digital and computationally informed humanities, unsupervised learning tends to be the preferred approach to automatic extraction of semantics from text-heavy data (e.g., graph-based clustering and mixed membership models). Although this approach results in a corpus simplification, thereby offloading the researcher’s interpretive burden, it has a preference for very general features (Topic models for instance extract general thematic structure), the coherence of which still relies heavily on the human interpretation (Latent Dirichlet Allocation, for instance, extracts a general thematic structure that is diluted by ‘junk structure’). An alternative, yet complimentary, approach is supervised learning. In supervised learning, we use class information (e.g., genre or temporal epoch) to emulate human concept learning in the corpus. While the standard goal of supervised learning is document classification, we will present a model prototype that utilize a simple algorithm to extract class typical (‘core’) and atypical (‘deviant’) semantic features from a set of documents.
3 Nov 2016


TitleHow To Do Things With Millions of Words
LocationUniversity of British Columbia
Degree of recognitionInternational event

ID: 107323779