Boaz Carmeli

2024

Concept-Best-Matching: Evaluating Compositionality In Emergent Communication
Boaz Carmeli | Yonatan Belinkov | Ron Meir
Findings of the Association for Computational Linguistics: ACL 2024

Artificial agents that learn to communicate in order to accomplish a given task acquire communication protocols that are typically opaque to a human. A large body of work has attempted to evaluate the emergent communication via various evaluation measures, with **compositionality** featuring as a prominent desired trait. However, current evaluation procedures do not directly expose the compositionality of the emergent communication. We propose a procedure to assess the compositionality of emergent communication by finding the best-match between emerged words and natural language concepts.The best-match algorithm provides both a global score and a translation-map from emergent words to natural language concepts. To the best of our knowledge, it is the first time that such direct and interpretable mapping between emergent words and human concepts is provided.

pdf bib abs

More Bang for your Context: Virtual Documents for Question Answering over Long Documents
Yosi Mass | Boaz Carmeli | Asaf Yehudai | Assaf Toledo | Nathaniel Mills
Findings of the Association for Computational Linguistics: EMNLP 2024

We deal with the problem of Question Answering (QA) over a long document, which poses a challenge for modern Large Language Models (LLMs). Although LLMs can handle increasingly longer context windows, they struggle to effectively utilize the long content. To address this issue, we introduce the concept of a virtual document (VDoc). A VDoc is created by selecting chunks from the original document that are most likely to contain the information needed to answer the user’s question, while ensuring they fit within the LLM’s context window. We hypothesize that providing a short and focused VDoc to the LLM is more effective than filling the entire context window with less relevant information. Our experiments confirm this hypothesis and demonstrate that using VDocs improves results on the QA task.

2022

pdf bib abs

Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora
George Kour | Samuel Ackerman | Eitan Daniel Farchi | Orna Raz | Boaz Carmeli | Ateret Anaby Tavor
Proceedings of the Second Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

Similarity metrics for text corpora are becoming critical due to the tremendous growth in the number of generative models. These similarity metrics measure the semantic gap between human and machine-generated text on the corpus level. However, standard methods for evaluating the characteristics of these metrics have yet to be established. We propose a set of automatic measures for evaluating the characteristics of semantic similarity metrics for text corpora. Our measures allow us to sensibly compare and identify the strengths and weaknesses of these metrics. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by comparing it to a collection of classical and state-of-the-art metrics. Our measures revealed that recent metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.

pdf bib abs

Exploration of the Usage of Color Terms by Color-blind Participants in Online Discussion Platforms
Ella Rabinovich | Boaz Carmeli
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Prominent questions about the role of sensory vs. linguistic input in the way we acquire and use language have been extensively studied in the psycholinguistic literature. However, the relative effect of various factors in a person’s overall experience on their linguistic system remains unclear. We study this question by making a step forward towards a better understanding of the conceptual perception of colors by color-blind individuals, as reflected in their spontaneous linguistic productions. Using a novel and carefully curated dataset, we show that red-green color-blind speakers use the “red” and “green” color terms in less predictable contexts, and in linguistic environments evoking mental image to a lower extent, when compared to their normal-sighted counterparts. These findings shed some new and interesting light on the role of sensory experience on our linguistic system.

2020

pdf bib abs

Data balancing is a known technique for improving the performance of classification tasks. In this work we define a novel balancing-viageneration framework termed BalaGen. BalaGen consists of a flexible balancing policy coupled with a text generation mechanism. Combined, these two techniques can be used to augment a dataset for more balanced distribution. We evaluate BalaGen on three publicly available semantic utterance classification (SUC) datasets. One of these is a new COVID-19 Q&A dataset published here for the first time. Our work demonstrates that optimal balancing policies can significantly improve classifier performance, while augmenting just part of the classes and under-sampling others. Furthermore, capitalizing on the advantages of balancing, we show its usefulness in all relevant BalaGen framework components. We validate the superiority of BalaGen on ten semantic utterance datasets taken from real-life goaloriented dialogue systems. Based on our results we encourage using data balancing prior to training for text classification tasks.

pdf bib abs

Unsupervised FAQ Retrieval with Question Generation and BERT
Yosi Mass | Boaz Carmeli | Haggai Roitman | David Konopnicki
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We focus on the task of Frequently Asked Questions (FAQ) retrieval. A given user query can be matched against the questions and/or the answers in the FAQ. We present a fully unsupervised method that exploits the FAQ pairs to train two BERT models. The two models match user queries to FAQ answers and questions, respectively. We alleviate the missing labeled data of the latter by automatically generating high-quality question paraphrases. We show that our model is on par and even outperforms supervised models on existing datasets.