Jason Yan
2024
Tab2Text - A framework for deep learning with tabular data
Tong Lin
|
Jason Yan
|
David Jurgens
|
Sabina J Tomkins
Findings of the Association for Computational Linguistics: EMNLP 2024
Tabular data, from public opinion surveys to records of interactions with social services, is foundational to the social sciences. One application of such data is to fit supervised learning models in order to predict consequential outcomes, for example: whether a family is likely to be evicted, whether a student will graduate from high school or is at risk of dropping out, and whether a voter will turn out in an upcoming election. While supervised learning has seen drastic improvements in performance with advancements in deep learning technology, these gains are largely lost on tabular data which poses unique difficulties for deep learning frameworks. We propose a technique for transforming tabular data to text data and demonstrate the extent to which this technique can improve the performance of deep learning models for tabular data. Overall, we find modest gains (1.5% on average). Interestingly, we find that these gains do not depend on using large language models to generate text.
2023
Exploring Linguistic Style Matching in Online Communities: The Role of Social Context and Conversation Dynamics
Aparna Ananthasubramaniam
|
Hong Chen
|
Jason Yan
|
Kenan Alkiek
|
Jiaxin Pei
|
Agrima Seth
|
Lavinia Dunagan
|
Minje Choi
|
Benjamin Litterer
|
David Jurgens
Proceedings of the First Workshop on Social Influence in Conversations (SICon 2023)
Linguistic style matching (LSM) in conversations can be reflective of several aspects of social influence such as power or persuasion. However, how LSM relates to the outcomes of online communication on platforms such as Reddit is an unknown question. In this study, we analyze a large corpus of two-party conversation threads in Reddit where we identify all occurrences of LSM using two types of style: the use of function words and formality. Using this framework, we examine how levels of LSM differ in conversations depending on several social factors within Reddit: post and subreddit features, conversation depth, user tenure, and the controversiality of a comment. Finally, we measure the change of LSM following loss of status after community banning. Our findings reveal the interplay of LSM in Reddit conversations with several community metrics, suggesting the importance of understanding conversation engagement when understanding community dynamics.
Search
Co-authors
- David Jurgens 2
- Tong Lin 1
- Sabina J Tomkins 1
- Aparna Ananthasubramaniam 1
- Hong Chen 1
- show all...