TY - JOUR
T1 - 'Super-Unsupervised' Classification for Labelling Text
T2 - Online Political Hostility as an Illustration
AU - Hebbelstrup Rye Rasmussen, Stig
AU - Bor, Alexander
AU - Osmundsen, Mathias
AU - Petersen, Michael Bang
N1 - Publisher Copyright:
Copyright © The Author(s), 2023. Published by Cambridge University Press.
PY - 2024/1/24
Y1 - 2024/1/24
N2 - We live in a world of text. Yet the sheer magnitude of social media data, coupled with a need to measure complex psychological constructs, has made this important source of data difficult to use. Researchers often engage in costly hand coding of thousands of texts using supervised techniques or rely on unsupervised techniques where the measurement of predefined constructs is difficult. We propose a novel approach that we call 'super-unsupervised' learning and demonstrate its usefulness by measuring the psychologically complex construct of online political hostility based on a large corpus of tweets. This approach accomplishes the feat by combining the best features of supervised and unsupervised learning techniques: measurements of complex psychological constructs without a single labelled data source. We first outline the approach before conducting a diverse series of tests that include: (i) face validity, (ii) convergent and discriminant validity, (iii) criterion validity, (iv) external validity, and (v) ecological validity.
AB - We live in a world of text. Yet the sheer magnitude of social media data, coupled with a need to measure complex psychological constructs, has made this important source of data difficult to use. Researchers often engage in costly hand coding of thousands of texts using supervised techniques or rely on unsupervised techniques where the measurement of predefined constructs is difficult. We propose a novel approach that we call 'super-unsupervised' learning and demonstrate its usefulness by measuring the psychologically complex construct of online political hostility based on a large corpus of tweets. This approach accomplishes the feat by combining the best features of supervised and unsupervised learning techniques: measurements of complex psychological constructs without a single labelled data source. We first outline the approach before conducting a diverse series of tests that include: (i) face validity, (ii) convergent and discriminant validity, (iii) criterion validity, (iv) external validity, and (v) ecological validity.
KW - natural language processing
KW - social media
KW - supervised learning
KW - unsupervised learning
UR - http://www.scopus.com/inward/record.url?scp=85179464907&partnerID=8YFLogxK
U2 - 10.1017/S0007123423000042
DO - 10.1017/S0007123423000042
M3 - Journal article
AN - SCOPUS:85179464907
SN - 0007-1234
VL - 54
SP - 179
EP - 200
JO - British Journal of Political Science
JF - British Journal of Political Science
IS - 1
ER -