Alon Halfon


2023

pdf bib
Zero-shot Topical Text Classification with LLMs - an Experimental Study
Shai Gretz | Alon Halfon | Ilya Shnayderman | Orith Toledo-Ronen | Artem Spector | Lena Dankin | Yannis Katsis | Ofir Arviv | Yoav Katz | Noam Slonim | Liat Ein-Dor
Findings of the Association for Computational Linguistics: EMNLP 2023

Topical Text Classification (TTC) is an ancient, yet timely research area in natural language processing, with many practical applications. The recent dramatic advancements in large LMs raise the question of how well these models can perform in this task in a zero-shot scenario. Here, we share a first comprehensive study, comparing the zero-shot performance of a variety of LMs over TTC23, a large benchmark collection of 23 publicly available TTC datasets, covering a wide range of domains and styles. In addition, we leverage this new TTC benchmark to create LMs that are specialized in TTC, by fine-tuning these LMs over a subset of the datasets and evaluating their performance over the remaining, held-out datasets. We show that the TTC-specialized LMs obtain the top performance on our benchmark, by a significant margin. Our code and model are made available for the community. We hope that the results presented in this work will serve as a useful guide for practitioners interested in topical text classification.

2022

pdf bib
Zero-Shot Text Classification with Self-Training
Ariel Gera | Alon Halfon | Eyal Shnarch | Yotam Perlitz | Liat Ein-Dor | Noam Slonim
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recent advances in large pretrained language models have increased attention to zero-shot text classification. In particular, models finetuned on natural language inference datasets have been widely adopted as zero-shot classifiers due to their promising results and off-the-shelf availability. However, the fact that such models are unfamiliar with the target task can lead to instability and performance issues. We propose a plug-and-play method to bridge this gap using a simple self-training approach, requiring only the class names along with an unlabeled dataset, and without the need for domain expertise or trial and error. We show that fine-tuning the zero-shot classifier on its most confident predictions leads to significant performance gains across a wide range of text classification tasks, presumably since self-training adapts the zero-shot model to the task at hand.

pdf bib
Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours
Eyal Shnarch | Alon Halfon | Ariel Gera | Marina Danilevsky | Yannis Katsis | Leshem Choshen | Martin Santillan Cooper | Dina Epelboim | Zheng Zhang | Dakuo Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Label Sleuth is an open source platform for building text classifiers which does not require coding skills nor machine learning knowledge.- Project website: [https://www.label-sleuth.org/](https://www.label-sleuth.org/)- Link to screencast video: [https://vimeo.com/735675461](https://vimeo.com/735675461)### AbstractText classification can be useful in many real-world scenarios, saving a lot of time for end users. However, building a classifier generally requires coding skills and ML knowledge, which poses a significant barrier for many potential users. To lift this barrier we introduce *Label Sleuth*, a free open source system for labeling and creating text classifiers. This system is unique for: - being a no-code system, making NLP accessible for non-experts. - guiding its users throughout the entire labeling process until they obtain their desired classifier, making the process efficient - from cold start to a classifier in a few hours. - being open for configuration and extension by developers. By open sourcing Label Sleuth we hope to build a community of users and developers that will widen the utilization of NLP models.

pdf bib
Cluster & Tune: Boost Cold Start Performance in Text Classification
Eyal Shnarch | Ariel Gera | Alon Halfon | Lena Dankin | Leshem Choshen | Ranit Aharonov | Noam Slonim
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In real-world scenarios, a text classification task often begins with a cold start, when labeled data is scarce. In such cases, the common practice of fine-tuning pre-trained models, such as BERT, for a target classification task, is prone to produce poor performance. We suggest a method to boost the performance of such models by adding an intermediate unsupervised classification task, between the pre-training and fine-tuning phases. As such an intermediate task, we perform clustering and train the pre-trained model on predicting the cluster labels. We test this hypothesis on various data sets, and show that this additional classification phase can significantly improve performance, mainly for topical classification tasks, when the number of labeled instances available for fine-tuning is only a couple of dozen to a few hundred.

2020

pdf bib
Active Learning for BERT: An Empirical Study
Liat Ein-Dor | Alon Halfon | Ariel Gera | Eyal Shnarch | Lena Dankin | Leshem Choshen | Marina Danilevsky | Ranit Aharonov | Yoav Katz | Noam Slonim
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Real world scenarios present a challenge for text classification, since labels are usually expensive and the data is often characterized by class imbalance. Active Learning (AL) is a ubiquitous paradigm to cope with data scarcity. Recently, pre-trained NLP models, and BERT in particular, are receiving massive attention due to their outstanding performance in various NLP tasks. However, the use of AL with deep pre-trained models has so far received little consideration. Here, we present a large-scale empirical study on active learning techniques for BERT-based classification, addressing a diverse set of AL strategies and datasets. We focus on practical scenarios of binary text classification, where the annotation budget is very small, and the data is often skewed. Our results demonstrate that AL can boost BERT performance, especially in the most realistic scenario in which the initial set of labeled examples is created using keyword-based queries, resulting in a biased sample of the minority class. We release our research framework, aiming to facilitate future research along the lines explored here.

2019

pdf bib
Syntactic Interchangeability in Word Embedding Models
Daniel Hershcovich | Assaf Toledo | Alon Halfon | Noam Slonim
Proceedings of the 3rd Workshop on Evaluating Vector Space Representations for NLP

Nearest neighbors in word embedding models are commonly observed to be semantically similar, but the relations between them can vary greatly. We investigate the extent to which word embedding models preserve syntactic interchangeability, as reflected by distances between word vectors, and the effect of hyper-parameters—context window size in particular. We use part of speech (POS) as a proxy for syntactic interchangeability, as generally speaking, words with the same POS are syntactically valid in the same contexts. We also investigate the relationship between interchangeability and similarity as judged by commonly-used word similarity benchmarks, and correlate the result with the performance of word embedding models on these benchmarks. Our results will inform future research and applications in the selection of word embedding model, suggesting a principle for an appropriate selection of the context window size parameter depending on the use-case.

pdf bib
Financial Event Extraction Using Wikipedia-Based Weak Supervision
Liat Ein-Dor | Ariel Gera | Orith Toledo-Ronen | Alon Halfon | Benjamin Sznajder | Lena Dankin | Yonatan Bilu | Yoav Katz | Noam Slonim
Proceedings of the Second Workshop on Economics and Natural Language Processing

Extraction of financial and economic events from text has previously been done mostly using rule-based methods, with more recent works employing machine learning techniques. This work is in line with this latter approach, leveraging relevant Wikipedia sections to extract weak labels for sentences describing economic events. Whereas previous weakly supervised approaches required a knowledge-base of such events, or corresponding financial figures, our approach requires no such additional data, and can be employed to extract economic events related to companies which are not even mentioned in the training data.

pdf bib
From Surrogacy to Adoption; From Bitcoin to Cryptocurrency: Debate Topic Expansion
Roy Bar-Haim | Dalia Krieger | Orith Toledo-Ronen | Lilach Edelstein | Yonatan Bilu | Alon Halfon | Yoav Katz | Amir Menczel | Ranit Aharonov | Noam Slonim
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

When debating a controversial topic, it is often desirable to expand the boundaries of discussion. For example, we may consider the pros and cons of possible alternatives to the debate topic, make generalizations, or give specific examples. We introduce the task of Debate Topic Expansion - finding such related topics for a given debate topic, along with a novel annotated dataset for the task. We focus on relations between Wikipedia concepts, and show that they differ from well-studied lexical-semantic relations such as hypernyms, hyponyms and antonyms. We present algorithms for finding both consistent and contrastive expansions and demonstrate their effectiveness empirically. We suggest that debate topic expansion may have various use cases in argumentation mining.

2018

pdf bib
Semantic Relatedness of Wikipedia Concepts – Benchmark Data and a Working Solution
Liat Ein Dor | Alon Halfon | Yoav Kantor | Ran Levy | Yosi Mass | Ruty Rinott | Eyal Shnarch | Noam Slonim
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Learning Thematic Similarity Metric from Article Sections Using Triplet Networks
Liat Ein Dor | Yosi Mass | Alon Halfon | Elad Venezian | Ilya Shnayderman | Ranit Aharonov | Noam Slonim
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper we suggest to leverage the partition of articles into sections, in order to learn thematic similarity metric between sentences. We assume that a sentence is thematically closer to sentences within its section than to sentences from other sections. Based on this assumption, we use Wikipedia articles to automatically create a large dataset of weakly labeled sentence triplets, composed of a pivot sentence, one sentence from the same section and one from another section. We train a triplet network to embed sentences from the same section closer. To test the performance of the learned embeddings, we create and release a sentence clustering benchmark. We show that the triplet network learns useful thematic metrics, that significantly outperform state-of-the-art semantic similarity methods and multipurpose embeddings on the task of thematic clustering of sentences. We also show that the learned embeddings perform well on the task of sentence semantic similarity prediction.

pdf bib
Learning Sentiment Composition from Sentiment Lexicons
Orith Toledo-Ronen | Roy Bar-Haim | Alon Halfon | Charles Jochim | Amir Menczel | Ranit Aharonov | Noam Slonim
Proceedings of the 27th International Conference on Computational Linguistics

Sentiment composition is a fundamental sentiment analysis problem. Previous work relied on manual rules and manually-created lexical resources such as negator lists, or learned a composition function from sentiment-annotated phrases or sentences. We propose a new approach for learning sentiment composition from a large, unlabeled corpus, which only requires a word-level sentiment lexicon for supervision. We automatically generate large sentiment lexicons of bigrams and unigrams, from which we induce a set of lexicons for a variety of sentiment composition processes. The effectiveness of our approach is confirmed through manual annotation, as well as sentiment classification experiments with both phrase-level and sentence-level benchmarks.