'Super-Unsupervised' Classification for Labelling Text: Online Political Hostility as an Illustration

Stig Hebbelstrup Rye Rasmussen*, Alexander Bor, Mathias Osmundsen, Michael Bang Petersen

*Corresponding author for this work

Research output: Contribution to journal/Conference contribution in journal/Contribution to newspaperJournal articleResearchpeer-review

2 Citations (Scopus)

Abstract

We live in a world of text. Yet the sheer magnitude of social media data, coupled with a need to measure complex psychological constructs, has made this important source of data difficult to use. Researchers often engage in costly hand coding of thousands of texts using supervised techniques or rely on unsupervised techniques where the measurement of predefined constructs is difficult. We propose a novel approach that we call 'super-unsupervised' learning and demonstrate its usefulness by measuring the psychologically complex construct of online political hostility based on a large corpus of tweets. This approach accomplishes the feat by combining the best features of supervised and unsupervised learning techniques: measurements of complex psychological constructs without a single labelled data source. We first outline the approach before conducting a diverse series of tests that include: (i) face validity, (ii) convergent and discriminant validity, (iii) criterion validity, (iv) external validity, and (v) ecological validity.

Original languageEnglish
JournalBritish Journal of Political Science
Volume54
Issue1
Pages (from-to)179-200
Number of pages22
ISSN0007-1234
DOIs
Publication statusPublished - 24 Jan 2024

Keywords

  • natural language processing
  • social media
  • supervised learning
  • unsupervised learning

Fingerprint

Dive into the research topics of ''Super-Unsupervised' Classification for Labelling Text: Online Political Hostility as an Illustration'. Together they form a unique fingerprint.

Cite this