Oren Tsur


2023

pdf bib
A Deeper (Autoregressive) Approach to Non-Convergent Discourse Parsing
Oren Tsur | Yoav Tulpan
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Online social platforms provide a bustling arena for information-sharing and for multi-party discussions. Various frameworks for dialogic discourse parsing were developed and used for the processing of discussions and for predicting the productivity of a dialogue. However, most of these frameworks are not suitable for the analysis of contentious discussions that are commonplace in many online platforms. A novel multi-label scheme for contentious dialog parsing was recently introduced by Zakharov et al. (2021). While the schema is well developed, the computational approach they provide is both naive and inefficient, as a different model (architecture) using a different representation of the input, is trained for each of the 31 tags in the annotation scheme. Moreover, all their models assume full knowledge of label collocations and context, which is unlikely in any realistic setting. In this work, we present a unified model for Non-Convergent Discourse Parsing that does not require any additional input other than the previous dialog utterances. We fine-tuned a RoBERTa backbone, combining embeddings of the utterance, the context and the labels through GRN layers and an asymmetric loss function. Overall, our model achieves results comparable with SOTA, without using label collocation and without training a unique architecture/model for each label. Our proposed architecture makes the labeling feasible at large scale, promoting the development of tools that deepen our understanding of discourse dynamics.

2022

pdf bib
Free speech or Free Hate Speech? Analyzing the Proliferation of Hate Speech in Parler
Abraham Israeli | Oren Tsur
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

Social platforms such as Gab and Parler, branded as ‘free-speech’ networks, have seen a significant growth of their user base in recent years. This popularity is mainly attributed to the stricter moderation enforced by mainstream platforms such as Twitter, Facebook, and Reddit.In this work we provide the first large scale analysis of hate-speech on Parler. We experiment with an array of algorithms for hate-speech detection, demonstrating limitations of transfer learning in that domain, given the illusive and ever changing nature of the ways hate-speech is delivered. In order to improve classification accuracy we annotated 10K Parler posts, which we use to fine-tune a BERT classifier. Classification of individual posts is then leveraged for the classification of millions of users via label propagation over the social network. Classifying users by their propensity to disseminate hate, we find that hate mongers make 16.1% of Parler active users, and that they have distinct characteristics comparing to other user groups. We further complement our analysis by comparing the trends observed in Parler to those found in Gab. To the best of our knowledge, this is among the first works to analyze hate speech in Parler in a quantitative manner and on the user level.

pdf bib
Detecting Suicide Risk in Online Counseling Services: A Study in a Low-Resource Language
Amir Bialer | Daniel Izmaylov | Avi Segal | Oren Tsur | Yossi Levi-Belz | Kobi Gal
Proceedings of the 29th International Conference on Computational Linguistics

With the increased awareness of situations of mental crisis and their societal impact, online services providing emergency support are becoming commonplace in many countries. Computational models, trained on discussions between help-seekers and providers, can support suicide prevention by identifying at-risk individuals. However, the lack of domain-specific models, especially in low-resource languages, poses a significant challenge for the automatic detection of suicide risk. We propose a model that combines pre-trained language models (PLM) with a fixed set of manually crafted (and clinically approved) set of suicidal cues, followed by a two-stage fine-tuning process. Our model achieves 0.91 ROC-AUC and an F2-score of 0.55, significantly outperforming an array of strong baselines even early on in the conversation, which is critical for real-time detection in the field. Moreover, the model performs well across genders and age groups.

pdf bib
How to Do Things without Words: Modeling Semantic Drift of Emoji
Eyal Arviv | Oren Tsur
Findings of the Association for Computational Linguistics: EMNLP 2022

Emoji have become a significant part of our informal textual communication. Previous work, addressing the societal and linguistic functions of emoji, overlooked the relation between the semantics and the visual variations of the symbols. In this paper we model and analyze the semantic drift of emoji and discuss the features that may be contributing to the drift, some are unique to emoji and some are more general. Specifically, we explore the relations between graphical changes and semantic changes.

2021

pdf bib
With Measured Words: Simple Sentence Selection for Black-Box Optimization of Sentence Compression Algorithms
Yotam Shichel | Meir Kalech | Oren Tsur
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Sentence Compression is the task of generating a shorter, yet grammatical, version of a given sentence, preserving the essence of the original sentence. This paper proposes a Black-Box Optimizer for Compression (B-BOC): given a black-box compression algorithm and assuming not all sentences need be compressed – find the best candidates for compression in order to maximize both compression rate and quality. Given a required compression ratio, we consider two scenarios: (i) single-sentence compression, and (ii) sentences-sequence compression. In the first scenario our optimizer is trained to predict how well each sentence could be compressed while meeting the specified ratio requirement. In the latter, the desired compression ratio is applied to a sequence of sentences (e.g., a paragraph) as a whole, rather than on each individual sentence. To achieve that we use B-BOC to assign an optimal compression ratio to each sentence, then cast it as a Knapsack problem which we solve using bounded dynamic programming. We evaluate B-BOC on both scenarios on three datasets, demonstrating that our optimizer improves both accuracy and Rouge-F1-score compared to direct application of other compression algorithms.

pdf bib
Open-Mindedness and Style Coordination in Argumentative Discussions
Aviv Ben-Haim | Oren Tsur
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Linguistic accommodation is the process in which speakers adjust their accent, diction, vocabulary, and other aspects of language according to the communication style of one another. Previous research has shown how linguistic accommodation correlates with gaps in the power and status of the speakers and the way it promotes approval and discussion efficiency. In this work, we provide a novel perspective on the phenomena, exploring its correlation with the open-mindedness of a speaker, rather than to her social status. We process thousands of unstructured argumentative discussions that took place in Reddit’s Change My View (CMV) subreddit, demonstrating that open-mindedness relates to the assumed role of a speaker in different contexts. On the discussion level, we surprisingly find that discussions that reach agreement present lower levels of accommodation.

2019

pdf bib
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science
Svitlana Volkova | David Jurgens | Dirk Hovy | David Bamman | Oren Tsur
Proceedings of the Third Workshop on Natural Language Processing and Computational Social Science

2017

pdf bib
ConStance: Modeling Annotation Contexts to Improve Stance Classification
Kenneth Joseph | Lisa Friedland | William Hobbs | David Lazer | Oren Tsur
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Manual annotations are a prerequisite for many applications of machine learning. However, weaknesses in the annotation process itself are easy to overlook. In particular, scholars often choose what information to give to annotators without examining these decisions empirically. For subjective tasks such as sentiment analysis, sarcasm, and stance detection, such choices can impact results. Here, for the task of political stance detection on Twitter, we show that providing too little context can result in noisy and uncertain annotations, whereas providing too strong a context may cause it to outweigh other signals. To characterize and reduce these biases, we develop ConStance, a general model for reasoning about annotations across information conditions. Given conflicting labels produced by multiple annotators seeing the same instances with different contexts, ConStance simultaneously estimates gold standard labels and also learns a classifier for new instances. We show that the classifier learned by ConStance outperforms a variety of baselines at predicting political stance, while the model’s interpretable parameters shed light on the effects of each context.

pdf bib
Proceedings of the Second Workshop on NLP and Computational Social Science
Dirk Hovy | Svitlana Volkova | David Bamman | David Jurgens | Brendan O’Connor | Oren Tsur | A. Seza Doğruöz
Proceedings of the Second Workshop on NLP and Computational Social Science

2016

pdf bib
Proceedings of the First Workshop on NLP and Computational Social Science
David Bamman | A. Seza Doğruöz | Jacob Eisenstein | Dirk Hovy | David Jurgens | Brendan O’Connor | Alice Oh | Oren Tsur | Svitlana Volkova
Proceedings of the First Workshop on NLP and Computational Social Science

2015

pdf bib
A Frame of Mind: Using Statistical Models for Detection of Framing and Agenda Setting Campaigns
Oren Tsur | Dan Calacci | David Lazer
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf bib
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media
Alice Oh | Benjamin Van Durme | David Yarowsky | Oren Tsur | Svitlana Volkova
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

pdf bib
As Long as You Name My Name Right: Social Circles and Social Sentiment in the Hollywood Hearings
Oren Tsur | Dan Calacci | David Lazer
Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media

2013

pdf bib
Authorship Attribution of Micro-Messages
Roy Schwartz | Oren Tsur | Ari Rappoport | Moshe Koppel
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Enhanced Sentiment Learning Using Twitter Hashtags and Smileys
Dmitry Davidov | Oren Tsur | Ari Rappoport
Coling 2010: Posters

pdf bib
Semi-Supervised Recognition of Sarcasm in Twitter and Amazon
Dmitry Davidov | Oren Tsur | Ari Rappoport
Proceedings of the Fourteenth Conference on Computational Natural Language Learning

2007

pdf bib
Using Classifier Features for Studying the Effect of Native Language on the Choice of Written Second Language Words
Oren Tsur | Ari Rappoport
Proceedings of the Workshop on Cognitive Aspects of Computational Language Acquisition

2004

pdf bib
BioGrapher: Biography Questions as a Restricted Domain Question Answering Task
Oren Tsur | Maarten de Rijke | Khalil Sima’an
Proceedings of the Conference on Question Answering in Restricted Domains