Automatic Classification of Students on Twitter Using Simple Profile Information

Lili-Michal Wilson, Christopher Wun


Abstract
Obtaining social media demographic information using machine learning is important for efficient computational social science research. Automatic age classification has been accomplished with relative success and allows for the study of youth populations, but student classification—determining which users are currently attending an academic institution—has not been thoroughly studied. Previous work (He et al., 2016) proposes a model which utilizes 3 tweet-content features to classify users as students or non-students. This model achieves an accuracy of 84%, but is restrictive and time intensive because it requires accessing and processing many user tweets. In this study, we propose classification models which use 7 numerical features and 10 text-based features drawn from simple profile information. These profile-based features allow for faster, more accessible data collection and enable the classification of users without needing access to their tweets. Compared to previous models, our models identify students with greater accuracy; our best model obtains an accuracy of 88.1% and an F1 score of .704. This improved student identification tool has the potential to facilitate research on topics ranging from professional networking to the impact of education on Twitter behaviors.
Anthology ID:
2020.aacl-srw.5
Volume:
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop
Month:
December
Year:
2020
Address:
Suzhou, China
Editors:
Boaz Shmueli, Yin Jou Huang
Venue:
AACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
30–36
Language:
URL:
https://aclanthology.org/2020.aacl-srw.5
DOI:
Bibkey:
Cite (ACL):
Lili-Michal Wilson and Christopher Wun. 2020. Automatic Classification of Students on Twitter Using Simple Profile Information. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 30–36, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Automatic Classification of Students on Twitter Using Simple Profile Information (Wilson & Wun, AACL 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.aacl-srw.5.pdf
Code
 christopherwun/twitter-student-classifier