Text Length and the Function of Intentionality: A Case Study of Contrastive Subreddits

Emily Sofi Ohman, Aatu Liimatta


Abstract
Text length is of central concern in natural language processing (NLP) tasks, yet it is very much under-researched. In this paper, we use social media data, specifically Reddit, to explore the function of text length and intentionality by contrasting subreddits of the same topic where one is considered more serious/professional/academic and the other more relaxed/beginner/layperson. We hypothesize that word choices are more deliberate and intentional in the more in-depth and professional subreddits with texts subsequently becoming longer as a function of this intentionality. We argue that this has deep implications for many applied NLP tasks such as emotion and sentiment analysis, fake news and disinformation detection, and other modeling tasks focused on social media and similar platforms where users interact with each other via the medium of text.
Anthology ID:
2024.nlp4dh-1.1
Volume:
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Month:
November
Year:
2024
Address:
Miami, USA
Editors:
Mika Hämäläinen, Emily Öhman, So Miyagawa, Khalid Alnajjar, Yuri Bizzoni
Venue:
NLP4DH
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1–8
Language:
URL:
https://aclanthology.org/2024.nlp4dh-1.1
DOI:
Bibkey:
Cite (ACL):
Emily Sofi Ohman and Aatu Liimatta. 2024. Text Length and the Function of Intentionality: A Case Study of Contrastive Subreddits. In Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities, pages 1–8, Miami, USA. Association for Computational Linguistics.
Cite (Informal):
Text Length and the Function of Intentionality: A Case Study of Contrastive Subreddits (Ohman & Liimatta, NLP4DH 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nlp4dh-1.1.pdf