Josephine Lukito
2024
Comparing a BERT Classifier and a GPT classifier for Detecting Connective Language Across Multiple Social Media
Josephine Lukito
|
Bin Chen
|
Gina M. Masullo
|
Natalie Jomini Stroud
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
This study presents an approach for detecting connective language—defined as language that facilitates engagement, understanding, and conversation—from social media discussions. We developed and evaluated two types of classifiers: BERT and GPT-3.5 turbo. Our results demonstrate that the BERT classifier significantly outperforms GPT-3.5 turbo in detecting connective language. Furthermore, our analysis confirms that connective language is distinct from related concepts measuring discourse qualities, such as politeness and toxicity. We also explore the potential of BERT-based classifiers for platform-agnostic tools. This research advances our understanding of the linguistic dimensions of online communication and proposes practical tools for detecting connective language across diverse digital environments.
2019
Using time series and natural language processing to identify viral moments in the 2016 U.S. Presidential Debate
Josephine Lukito
|
Prathusha K Sarma
|
Jordan Foley
|
Aman Abhishek
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
This paper proposes a method for identifying and studying viral moments or highlights during a political debate. Using a combined strategy of time series analysis and domain adapted word embeddings, this study provides an in-depth analysis of several key moments during the 2016 U.S. Presidential election. First, a time series outlier analysis is used to identify key moments during the debate. These moments had to result in a long-term shift in attention towards either Hillary Clinton or Donald Trump (i.e., a transient change outlier or an intervention, resulting in a permanent change in the time series). To assess whether these moments also resulted in a discursive shift, two corpora are produced for each potential viral moment (a pre-viral corpus and post-viral corpus). A domain adaptation layer learns weights to combine a generic and domain-specific (DS) word embedding into a domain adapted (DA) embedding. Words are then classified using a generic encoder+ classifier framework that relies on these word embeddings as inputs. Results suggest that both Clinton and Trump were able to induce discourse-shifting viral moments, though the former is much better at producing a topically-specific discursive shift.
Search
Co-authors
- Bin Chen 1
- Gina M. Masullo 1
- Natalie Jomini Stroud 1
- Prathusha Kameswara Sarma 1
- Jordan Foley 1
- show all...