Shai Gretz


2023

pdf bib
Benchmark Data and Evaluation Framework for Intent Discovery Around COVID-19 Vaccine Hesitancy
Shai Gretz | Assaf Toledo | Roni Friedman | Dan Lahav | Rose Weeks | Naor Bar-Zeev | João Sedoc | Pooja Sangha | Yoav Katz | Noam Slonim
Findings of the Association for Computational Linguistics: EACL 2023

The COVID-19 pandemic has made a huge global impact and cost millions of lives. As COVID-19 vaccines were rolled out, they were quickly met with widespread hesitancy. To address the concerns of hesitant people, we launched VIRA, a public dialogue system aimed at addressing questions and concerns surrounding the COVID-19 vaccines. Here, we release VIRADialogs, a dataset of over 8k dialogues conducted by actual users with VIRA, providing a unique real-world conversational dataset. In light of rapid changes in users’ intents, due to updates in guidelines or in response to new information, we highlight the important task of intent discovery in this use-case. We introduce a novel automatic evaluation framework for intent discovery, leveraging the existing intent classifier of VIRA. We use this framework to report baseline intent discovery results over VIRADialogs, that highlight the difficulty of this task.

pdf bib
Zero-shot Topical Text Classification with LLMs - an Experimental Study
Shai Gretz | Alon Halfon | Ilya Shnayderman | Orith Toledo-Ronen | Artem Spector | Lena Dankin | Yannis Katsis | Ofir Arviv | Yoav Katz | Noam Slonim | Liat Ein-Dor
Findings of the Association for Computational Linguistics: EMNLP 2023

Topical Text Classification (TTC) is an ancient, yet timely research area in natural language processing, with many practical applications. The recent dramatic advancements in large LMs raise the question of how well these models can perform in this task in a zero-shot scenario. Here, we share a first comprehensive study, comparing the zero-shot performance of a variety of LMs over TTC23, a large benchmark collection of 23 publicly available TTC datasets, covering a wide range of domains and styles. In addition, we leverage this new TTC benchmark to create LMs that are specialized in TTC, by fine-tuning these LMs over a subset of the datasets and evaluating their performance over the remaining, held-out datasets. We show that the TTC-specialized LMs obtain the top performance on our benchmark, by a significant margin. Our code and model are made available for the community. We hope that the results presented in this work will serve as a useful guide for practitioners interested in topical text classification.

2020

pdf bib
The workweek is the best time to start a family – A Study of GPT-2 Based Claim Generation
Shai Gretz | Yonatan Bilu | Edo Cohen-Karlik | Noam Slonim
Findings of the Association for Computational Linguistics: EMNLP 2020

Argument generation is a challenging task whose research is timely considering its potential impact on social media and the dissemination of information. Here we suggest a pipeline based on GPT-2 for generating coherent claims, and explore the types of claims that it produces, and their veracity, using an array of manual and automatic assessments. In addition, we explore the interplay between this task and the task of Claim Retrieval, showing how they can complement one another.

2019

pdf bib
Automatic Argument Quality Assessment - New Datasets and Methods
Assaf Toledo | Shai Gretz | Edo Cohen-Karlik | Roni Friedman | Elad Venezian | Dan Lahav | Michal Jacovi | Ranit Aharonov | Noam Slonim
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We explore the task of automatic assessment of argument quality. To that end, we actively collected 6.3k arguments, more than a factor of five compared to previously examined data. Each argument was explicitly and carefully annotated for its quality. In addition, 14k pairs of arguments were annotated independently, identifying the higher quality argument in each pair. In spite of the inherent subjective nature of the task, both annotation schemes led to surprisingly consistent results. We release the labeled datasets to the community. Furthermore, we suggest neural methods based on a recently released language model, for argument ranking as well as for argument-pair classification. In the former task, our results are comparable to state-of-the-art; in the latter task our results significantly outperform earlier methods.

pdf bib
Towards Effective Rebuttal: Listening Comprehension Using Corpus-Wide Claim Mining
Tamar Lavee | Matan Orbach | Lili Kotlerman | Yoav Kantor | Shai Gretz | Lena Dankin | Michal Jacovi | Yonatan Bilu | Ranit Aharonov | Noam Slonim
Proceedings of the 6th Workshop on Argument Mining

Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indeed correspond to those made in spoken speeches. To this end, we collected a large dataset of 400 speeches in English discussing 200 controversial topics, mined claims for each topic, and asked annotators to identify the mined claims mentioned in each speech. Results show that in the vast majority of speeches debaters indeed make use of such claims. In addition, we present several baselines for the automatic detection of mined claims in speeches, forming the basis for future work. All collected data is freely available for research.

2018

pdf bib
Towards an argumentative content search engine using weak supervision
Ran Levy | Ben Bogin | Shai Gretz | Ranit Aharonov | Noam Slonim
Proceedings of the 27th International Conference on Computational Linguistics

Searching for sentences containing claims in a large text corpus is a key component in developing an argumentative content search engine. Previous works focused on detecting claims in a small set of documents or within documents enriched with argumentative content. However, pinpointing relevant claims in massive unstructured corpora, received little attention. A step in this direction was taken in (Levy et al. 2017), where the authors suggested using a weak signal to develop a relatively strict query for claim–sentence detection. Here, we leverage this work to define weak signals for training DNNs to obtain significantly greater performance. This approach allows to relax the query and increase the potential coverage. Our results clearly indicate that the system is able to successfully generalize from the weak signal, outperforming previously reported results in terms of both precision and coverage. Finally, we adapt our system to solve a recent argument mining task of identifying argumentative sentences in Web texts retrieved from heterogeneous sources, and obtain F1 scores comparable to the supervised baseline.

2017

pdf bib
Unsupervised corpus–wide claim detection
Ran Levy | Shai Gretz | Benjamin Sznajder | Shay Hummel | Ranit Aharonov | Noam Slonim
Proceedings of the 4th Workshop on Argument Mining

Automatic claim detection is a fundamental argument mining task that aims to automatically mine claims regarding a topic of consideration. Previous works on mining argumentative content have assumed that a set of relevant documents is given in advance. Here, we present a first corpus– wide claim detection framework, that can be directly applied to massive corpora. Using simple and intuitive empirical observations, we derive a claim sentence query by which we are able to directly retrieve sentences in which the prior probability to include topic-relevant claims is greatly enhanced. Next, we employ simple heuristics to rank the sentences, leading to an unsupervised corpus–wide claim detection system, with precision that outperforms previously reported results on the task of claim detection given relevant documents and labeled data.