Class-based Identification of ‘Deviant’ Semantic Features in Historical Corpora

Activity: Presentations, memberships, employment, ownership and other activitiesLecture and oral contribution

Description

In digital and computationally informed humanities, unsupervised learning tends to be the preferred approach to automatic extraction of semantics from text-heavy data (e.g., graph-based clustering and mixed membership models). Although this approach results in a corpus simplification, thereby offloading the researcher’s interpretive burden, it has a preference for very general features (Topic models for instance extract general thematic structure), the coherence of which still relies heavily on the human interpretation (Latent Dirichlet Allocation, for instance, extracts a general thematic structure that is diluted by ‘junk structure’). An alternative, yet complimentary, approach is supervised learning. In supervised learning, we use class information (e.g., genre or temporal epoch) to emulate human concept learning in the corpus. While the standard goal of supervised learning is document classification, we will present a model prototype that utilize a simple algorithm to extract class typical (‘core’) and atypical (‘deviant’) semantic features from a set of documents.
Period3 Nov 2016
Event titleHow To Do Things With Millions of Words
Event typeConference
LocationVancouver, CanadaShow on map
Degree of RecognitionInternational