Aleksandr Rubashevskii


2024

pdf bib
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Ekaterina Fadeeva | Aleksandr Rubashevskii | Artem Shelmanov | Sergey Petrakov | Haonan Li | Hamdy Mubarak | Evgenii Tsymbalov | Gleb Kuzmin | Alexander Panchenko | Timothy Baldwin | Preslav Nakov | Maxim Panov
Findings of the Association for Computational Linguistics: ACL 2024

Large language models (LLMs) are notorious for hallucinating, i.e., producing erroneous claims in their output. Such hallucinations can be dangerous, as occasional factual inaccuracies in the generated text might be obscured by the rest of the output being generally factually correct, making it extremely hard for the users to spot them. Current services that leverage LLMs usually do not provide any means for detecting unreliable generations. Here, we aim to bridge this gap. In particular, we propose a novel fact-checking and hallucination detection pipeline based on token-level uncertainty quantification. Uncertainty scores leverage information encapsulated in the output of a neural network or its layers to detect unreliable predictions, and we show that they can be used to fact-check the atomic claims in the LLM output. Moreover, we present a novel token-level uncertainty quantification method that removes the impact of uncertainty about what claim to generate on the current step and what surface form to use. Our method Claim Conditioned Probability (CCP) measures only the uncertainty of a particular claim value expressed by the model. Experiments on the task of biography generation demonstrate strong improvements for CCP compared to the baselines for seven different LLMs and four languages. Human evaluation reveals that the fact-checking pipeline based on uncertainty quantification is competitive with a fact-checking tool that leverages external knowledge.

pdf bib
Factcheck-Bench: Fine-Grained Evaluation Benchmark for Automatic Fact-checkers
Yuxia Wang | Revanth Gangi Reddy | Zain Muhammad Mujahid | Arnav Arora | Aleksandr Rubashevskii | Jiahui Geng | Osama Mohammed Afzal | Liangming Pan | Nadav Borenstein | Aditya Pillai | Isabelle Augenstein | Iryna Gurevych | Preslav Nakov
Findings of the Association for Computational Linguistics: EMNLP 2024

The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. In this work, we present Factcheck-Bench, a holistic end-to-end framework for annotating and evaluating the factuality of LLM-generated responses, which encompasses a multi-stage annotation scheme designed to yield detailed labels for fact-checking and correcting not just the final prediction, but also the intermediate steps that a fact-checking system might need to take. Based on this framework, we construct an open-domain factuality benchmark in three-levels of granularity: claim, sentence, and document. We further propose a system, Factcheck-GPT, which follows our framework, and we show that it outperforms several popular LLM fact-checkers. We make our annotation tool, annotated data, benchmark, and code available at https://github.com/yuxiaw/Factcheck-GPT.

2022

pdf bib
ALToolbox: A Set of Tools for Active Learning Annotation of Natural Language Texts
Akim Tsvigun | Leonid Sanochkin | Daniil Larionov | Gleb Kuzmin | Artem Vazhentsev | Ivan Lazichny | Nikita Khromov | Danil Kireev | Aleksandr Rubashevskii | Olga Shahmatova | Dmitry V. Dylov | Igor Galitskiy | Artem Shelmanov
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

We present ALToolbox – an open-source framework for active learning (AL) annotation in natural language processing. Currently, the framework supports text classification, sequence tagging, and seq2seq tasks. Besides state-of-the-art query strategies, ALToolbox provides a set of tools that help to reduce computational overhead and duration of AL iterations and increase annotated data reusability. The framework aims to support data scientists and researchers by providing an easy-to-deploy GUI annotation tool directly in the Jupyter IDE and an extensible benchmark for novel AL methods. We prepare a small demonstration of ALToolbox capabilities available online. The code of the framework is published under the MIT license.