2024
pdf
bib
abs
Multi-Level Information Retrieval Augmented Generation for Knowledge-based Visual Question Answering
Omar Adjali
|
Olivier Ferret
|
Sahar Ghannay
|
Hervé Le Borgne
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
The Knowledge-Aware Visual Question Answering about Entity task aims to disambiguate entities using textual and visual information, as well as knowledge. It usually relies on two independent steps, information retrieval then reading comprehension, that do not benefit each other. Retrieval Augmented Generation (RAG) offers a solution by using generated answers as feedback for retrieval training. RAG usually relies solely on pseudo-relevant passages retrieved from external knowledge bases which can lead to ineffective answer generation. In this work, we propose a multi-level information RAG approach that enhances answer generation through entity retrieval and query expansion. We formulate a joint-training RAG loss such that answer generation is conditioned on both entity and passage retrievals. We show through experiments new state-of-the-art performance on the VIQuAE KB-VQA benchmark and demonstrate that our approach can help retrieve more actual relevant knowledge to generate accurate answers.
pdf
bib
abs
Exploring Retrieval Augmented Generation For Real-world Claim Verification
Omar Adjali
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)
Automated Fact-Checking (AFC) has recently gained considerable attention to address the increasing misinformation spreading in the web and social media. The recently introduced AVeriTeC dataset alleviates some limitations of existing AFC benchmarks. In this paper, we propose to explore Retrieval Augmented Generation (RAG) and describe the system (UPS participant) we implemented to solve the AVeriTeC shared task.Our end-to-end system integrates retrieval and generation in a joint training setup to enhance evidence retrieval and question generation. Our system operates as follows: First, we conduct dense retrieval of evidence by encoding candidate evidence sentences from the provided knowledge store documents. Next, we perform a secondary retrieval of question-answer pairs from the training set, encoding these into dense vectors to support question generation with relevant in-context examples. During training, the question generator is optimized to generate questions based on retrieved or gold evidence. In preliminary automatic evaluation, our system achieved respectively 0.198 and 0.210 AVeriTeC scores on the dev and test sets.
2022
pdf
bib
abs
Building Comparable Corpora for Assessing Multi-Word Term Alignment
Omar Adjali
|
Emmanuel Morin
|
Pierre Zweigenbaum
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Recent work has demonstrated the importance of dealing with Multi-Word Terms (MWTs) in several Natural Language Processing applications. In particular, MWTs pose serious challenges for alignment and machine translation systems because of their syntactic and semantic properties. Thus, developing algorithms that handle MWTs is becoming essential for many NLP tasks. However, the availability of bilingual and more generally multi-lingual resources is limited, especially for low-resourced languages and in specialized domains. In this paper, we propose an approach for building comparable corpora and bilingual term dictionaries that help evaluate bilingual term alignment in comparable corpora. To that aim, we exploit parallel corpora to perform automatic bilingual MWT extraction and comparable corpus construction. Parallel information helps to align bilingual MWTs and makes it easier to build comparable specialized sub-corpora. Experimental validation on an existing dataset and on manually annotated data shows the interest of the proposed methodology.
pdf
bib
abs
OFU@SMM4H’22: Mining Advent Drug Events Using Pretrained Language Models
Omar Adjali
|
Fréjus A. A. Laleye
|
Umang Aggarwal
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task
We describe in this paper our proposed systems for the Social Media Mining for Health 2022 shared task 1. In particular, we participated in the three sub-tasks, tasks that aim at extracting and processing Adverse Drug Events. We investigate different transformer-based pretrained models we fine-tuned on each task and proposed some improvement on the task of entity normalization.
2020
pdf
bib
abs
Building a Multimodal Entity Linking Dataset From Tweets
Omar Adjali
|
Romaric Besançon
|
Olivier Ferret
|
Hervé Le Borgne
|
Brigitte Grau
Proceedings of the Twelfth Language Resources and Evaluation Conference
The task of Entity linking, which aims at associating an entity mention with a unique entity in a knowledge base (KB), is useful for advanced Information Extraction tasks such as relation extraction or event detection. Most of the studies that address this problem rely only on textual documents while an increasing number of sources are multimedia, in particular in the context of social media where messages are often illustrated with images. In this article, we address the Multimodal Entity Linking (MEL) task, and more particularly the problem of its evaluation. To this end, we propose a novel method to quasi-automatically build annotated datasets to evaluate methods on the MEL task. The method collects text and images to jointly build a corpus of tweets with ambiguous mentions along with a Twitter KB defining the entities. We release a new annotated dataset of Twitter posts associated with images. We study the key characteristics of the proposed dataset and evaluate the performance of several MEL approaches on it.