Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes

Veronica Lynn; Salvatore Giorgi; Niranjan Balasubramanian; H. Andrew Schwartz

doi:10.18653/v1/W19-2103

Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes

Veronica Lynn, Salvatore Giorgi, Niranjan Balasubramanian, H. Andrew Schwartz

Abstract

NLP naturally puts a primary focus on leveraging document language, occasionally considering user attributes as supplemental. However, as we tackle more social scientific tasks, it is possible user attributes might be of primary importance and the document supplemental. Here, we systematically investigate the predictive power of user-level features alone versus document-level features for document-level tasks. We first show user attributes can sometimes carry more task-related information than the document itself. For example, a tweet-level stance detection model using only 13 user-level attributes (i.e. features that did not depend on the specific tweet) was able to obtain a higher F1 than the top-performing SemEval participant. We then consider multiple tasks and a wider range of user attributes, showing the performance of strong document-only models can often be improved (as in stance, sentiment, and sarcasm) with user attributes, particularly benefiting tasks with stable “trait-like” outcomes (e.g. stance) most relative to frequently changing “state-like” outcomes (e.g. sentiment). These results not only support the growing work on integrating user factors into predictive systems, but that some of our NLP tasks might be better cast primarily as user-level (or human) tasks.

Anthology ID:: W19-2103
Volume:: Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
Month:: June
Year:: 2019
Address:: Minneapolis, Minnesota
Editors:: Svitlana Volkova, David Jurgens, Dirk Hovy, David Bamman, Oren Tsur
Venue:: NLP+CSS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18–28
Language:
URL:: https://aclanthology.org/W19-2103/
DOI:: 10.18653/v1/W19-2103
Bibkey:
Cite (ACL):: Veronica Lynn, Salvatore Giorgi, Niranjan Balasubramanian, and H. Andrew Schwartz. 2019. Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes. In Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science, pages 18–28, Minneapolis, Minnesota. Association for Computational Linguistics.
Cite (Informal):: Tweet Classification without the Tweet: An Empirical Examination of User versus Document Attributes (Lynn et al., NLP+CSS 2019)
Copy Citation:
PDF:: https://aclanthology.org/W19-2103.pdf

PDF Cite Search Fix data