Yaser Yacoob


2024

pdf bib
MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
Fuxiao Liu | Xiaoyang Wang | Wenlin Yao | Jianshu Chen | Kaiqiang Song | Sangwoo Cho | Yaser Yacoob | Dong Yu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has beenimpressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the domain of chartimage understanding due to the distinct abstract components in charts. To address this, we introduce a large-scale MultiModal ChartInstruction (MMC-Instruction) dataset comprising 600k instances supporting diverse tasks and chart types. Leveraging this data, we de-velop MultiModal Chart Assistant (MMCA), an LMM that achieves state-of-the-art performance on existing chart QA benchmarks. Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (MMC-Benchmark), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts.Extensive experiments on MMC-Benchmark reveal the limitations of existing LMMs on correctly interpreting charts, even for the mostrecent GPT-4V model. Our work provides an instruction-tuning methodology and benchmark to advance multimodal understanding ofcharts. Code and data are available at https://github.com/FuxiaoLiu/MMC.

2023

pdf bib
COVID-VTS: Fact Extraction and Verification on Short Video Platforms
Fuxiao Liu | Yaser Yacoob | Abhinav Shrivastava
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation. We propose, TwtrDetective, an effective model incorporating cross-media consistency checking to detect token-level malicious tampering in different modalities, and generate explanations. Due to the scarcity of training data, we also develop an efficient and scalable approach to automatically generate misleading video posts by event manipulation or adversarial matching. We investigate several state-of-the-art models and demonstrate the superiority of TwtrDetective.