Noam Koenigstein

2024

InterrogateLLM: Zero-Resource Hallucination Detection in LLM-Generated Answers
Yakir Yehuda | Itzik Malkiel | Oren Barkan | Jonathan Weill | Royi Ronen | Noam Koenigstein
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite the many advances of Large Language Models (LLMs) and their unprecedented rapid evolution, their impact and integration into every facet of our daily lives is limited due to various reasons. One critical factor hindering their widespread adoption is the occurrence of hallucinations, where LLMs invent answers that sound realistic, yet drift away from factual truth. In this paper, we present a novel method for detecting hallucinations in large language models, which tackles a critical issue in the adoption of these models in various real-world scenarios. Through extensive evaluations across multiple datasets and LLMs, including Llama-2, we study the hallucination levels of various recent LLMs and demonstrate the effectiveness of our method to automatically detect them. Notably, we observe up to 87% hallucinations for Llama-2 in a specific experiment, where our method achieves a Balanced Accuracy of 81%, all without relying on external knowledge.

pdf bib abs

Improving LLM Attributions with Randomized Path-Integration
Oren Barkan | Yehonatan Elisha | Yonatan Toib | Jonathan Weill | Noam Koenigstein
Findings of the Association for Computational Linguistics: EMNLP 2024

We present Randomized Path-Integration (RPI) - a path-integration method for explaining language models via randomization of the integration path over the attention information in the model. RPI employs integration on internal attention scores and their gradients along a randomized path, which is dynamically established between a baseline representation and the attention scores of the model. The inherent randomness in the integration path originates from modeling the baseline representation as a randomly drawn tensor from a Gaussian diffusion process. As a consequence, RPI generates diverse baselines, yielding a set of candidate attribution maps. This set facilitates the selection of the most effective attribution map based on the specific metric at hand. We present an extensive evaluation, encompassing 11 explanation methods and 5 language models, including the Llama2 and Mistral models. Our results demonstrate that RPI outperforms latest state-of-the-art methods across 4 datasets and 5 evaluation metrics.

pdf bib abs

LLM Explainability via Attributive Masking Learning
Oren Barkan | Yonatan Toib | Yehonatan Elisha | Jonathan Weill | Noam Koenigstein
Findings of the Association for Computational Linguistics: EMNLP 2024

In this paper, we introduce Attributive Masking Learning (AML), a method designed for explaining language model predictions by learning input masks. AML trains an attribution model to identify influential tokens in the input for a given language model’s prediction. The central concept of AML is to train an auxiliary attribution model to simultaneously 1) mask as much input data as possible while ensuring that the language model’s prediction closely aligns with its prediction on the original input, and 2) ensure a significant change in the model’s prediction when applying the inverse (complement) of the same mask to the input. This dual-masking approach further enables the optimization of the explanation w.r.t. the metric of interest. We demonstrate the effectiveness of AML on both encoder-based and decoder-based language models, showcasing its superiority over a variety of state-of-the-art explanation methods on multiple benchmarks.

2021

pdf bib

Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference
Dvir Ginzburg | Itzik Malkiel | Oren Barkan | Avi Caciularu | Noam Koenigstein
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf bib abs

Language models that utilize extensive self-supervised pre-training from unlabeled text, have recently shown to significantly advance the state-of-the-art performance in a variety of language understanding tasks. However, it is yet unclear if and how these recent models can be harnessed for conducting text-based recommendations. In this work, we introduce RecoBERT, a BERT-based approach for learning catalog-specialized language models for text-based item recommendations. We suggest novel training and inference procedures for scoring similarities between pairs of items, that don’t require item similarity labels. Both the training and the inference techniques were designed to utilize the unlabeled structure of textual catalogs, and minimize the discrepancy between them. By incorporating four scores during inference, RecoBERT can infer text-based item-to-item similarities more accurately than other techniques. In addition, we introduce a new language understanding task for wine recommendations using similarities based on professional wine reviews. As an additional contribution, we publish annotated recommendations dataset crafted by human wine experts. Finally, we evaluate RecoBERT and compare it to various state-of-the-art NLP models on wine and fashion recommendations tasks.

pdf bib abs

Bayesian Hierarchical Words Representation Learning
Oren Barkan | Idan Rejwan | Avi Caciularu | Noam Koenigstein
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

This paper presents the Bayesian Hierarchical Words Representation (BHWR) learning algorithm. BHWR facilitates Variational Bayes word representation learning combined with semantic taxonomy modeling via hierarchical priors. By propagating relevant information between related words, BHWR utilizes the taxonomy to improve the quality of such representations. Evaluation of several linguistic datasets demonstrates the advantages of BHWR over suitable alternatives that facilitate Bayesian modeling with or without semantic priors. Finally, we further show that BHWR produces better representations for rare words.

Co-authors

Venues

Findings4
ACL2

Fix author