Yifan Ethan Xu
2024
SnapNTell: Enhancing Entity-Centric Visual Question Answering with Retrieval Augmented Multimodal LLM
Jielin Qiu
|
Andrea Madotto
|
Zhaojiang Lin
|
Paul A. Crook
|
Yifan Ethan Xu
|
Babak Damavandi
|
Xin Luna Dong
|
Christos Faloutsos
|
Lei Li
|
Seungwhan Moon
Findings of the Association for Computational Linguistics: EMNLP 2024
Vision-extended LLMs have made significant strides in Visual Question Answering (VQA). Despite these advancements, VLLMs still encounter substantial difficulties in handling queries involving long-tail entities, with a tendency to produce erroneous or hallucinated responses. In this work, we introduce a novel evaluative benchmark named SnapNTell, specifically tailored for entity-centric VQA. This task aims to test the models’ capabilities in identifying entities and providing detailed, entity-specific knowledge. We have developed the SnapNTell Dataset, distinct from traditional VQA datasets: (1) It encompasses a wide range of categorized entities, each represented by images and explicitly named in the answers; (2) It features QA pairs that require extensive knowledge for accurate responses. The dataset is organized into 22 major categories, containing 7,568 unique entities in total. For each entity, we curated 10 illustrative images and crafted 10 knowledge-intensive QA pairs. To address this novel task, we devised a scalable, efficient, and transparent retrieval-augmented multimodal LLM. Our approach markedly outperforms existing methods on the SnapNTell dataset, achieving a 66.5% improvement in the BELURT score.
2023
Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-training for E-commerce Catalog
Kewei Cheng
|
Xian Li
|
Zhengyang Wang
|
Chenwei Zhang
|
Binxuan Huang
|
Yifan Ethan Xu
|
Xin Luna Dong
|
Yizhou Sun
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Product catalogs, conceptually in the form of text-rich tables, are self-reported by individual retailers and thus inevitably contain noisy facts. Verifying such textual attributes in product catalogs is essential to improve their reliability. However, popular methods for processing free-text content, such as pre-trained language models, are not particularly effective on structured tabular data since they are typically trained on free-form natural language texts. In this paper, we present Tab-Cleaner, a model designed to handle error detection over text-rich tabular data following a pre-training / fine-tuning paradigm. We train Tab-Cleaner on a real-world Amazon Product Catalog table w.r.t millions of products and show improvements over state-of-the-art methods by 16\% on PR AUC over attribute applicability classification task and by 11\% on PR AUC over attribute value validation task.
Search
Fix data
Co-authors
- Xin Luna Dong 2
- Kewei Cheng 1
- Paul A. Crook 1
- Babak Damavandi 1
- Christos Faloutsos 1
- show all...