Moshe Berchansky

2024

CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity
Moshe Berchansky | Daniel Fleischer | Moshe Wasserblat | Peter Izsak
Findings of the Association for Computational Linguistics: EMNLP 2024

State-of-the-art performance in QA tasks is currently achieved by systems employing Large Language Models (LLMs), however these models tend to hallucinate information in their responses. One approach focuses on enhancing the generation process by incorporating attribution from the given input to the output. However, the challenge of identifying appropriate attributions and verifying their accuracy against a source is a complex task that requires significant improvements in assessing such systems. We introduce an attribution-oriented Chain-of-Thought reasoning method to enhance the accuracy of attributions. This approach focuses the reasoning process on generating an attribution-centric output. Evaluations on two context enhanced question-answering datasets using GPT-4 demonstrate improved accuracy and correctness of attributions. In addition, the combination of our method with finetuning enhances the response and attribution accuracy of two smaller LLMs, showing their potential to outperform GPT-4 in some cases.

2023

pdf bib abs

Optimizing Retrieval-augmented Reader Models via Token Elimination
Moshe Berchansky | Peter Izsak | Avi Caciularu | Ido Dagan | Moshe Wasserblat
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribution and necessity of all the retrieved passages to the performance of reader models, and propose eliminating some of the retrieved information, at the token level, that might not contribute essential information to the answer generation process. We demonstrate that our method can reduce run-time by up to 62.2%, with only a 2% reduction in performance, and in some cases, even improve the performance results.

2021

pdf bib abs

How to Train BERT with an Academic Budget
Peter Izsak | Moshe Berchansky | Omer Levy
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERT-base on GLUE tasks at a fraction of the original pretraining cost.

Co-authors

Omer Levy 1

Venues

EMNLP2
Findings1

Fix author