Jindong Gu
2024
Visual Question Decomposition on Multimodal Large Language Models
Haowei Zhang
|
Jianzhe Liu
|
Zhen Han
|
Shuo Chen
|
Bailan He
|
Volker Tresp
|
Zhiqiang Xu
|
Jindong Gu
Findings of the Association for Computational Linguistics: EMNLP 2024
Question decomposition has emerged as an effective strategy for prompting Large Language Models (LLMs) to answer complex questions. However, while existing methods primarily focus on unimodal language models, the question decomposition capability of Multimodal Large Language Models (MLLMs) has yet to be explored. To this end, this paper explores visual question decomposition on MLLMs. Specifically, we introduce a systematic evaluation framework including a dataset and several evaluation criteria to assess the quality of the decomposed sub-questions, revealing that existing MLLMs struggle to produce high-quality sub-questions. To address this limitation, we propose a specific finetuning dataset, DecoVQA+, for enhancing the model’s question decomposition capability. Aiming at enabling models to perform appropriate selective decomposition, we propose an efficient finetuning pipeline. The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition. Finetuned MLLMs demonstrate significant improvements in the quality of sub-questions and the policy of selective question decomposition. Additionally, the models also achieve higher accuracy with selective decomposition on VQA benchmark datasets.
2023
ECOLA: Enhancing Temporal Knowledge Embeddings with Contextualized Language Representations
Zhen Han
|
Ruotong Liao
|
Jindong Gu
|
Yao Zhang
|
Zifeng Ding
|
Yujia Gu
|
Heinz Koeppl
|
Hinrich Schütze
|
Volker Tresp
Findings of the Association for Computational Linguistics: ACL 2023
Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts. However, existing enhancement approaches cannot apply to temporal knowledge graphs (tKGs), which contain time-dependent event knowledge with complex temporal dynamics. Specifically, existing enhancement approaches often assume knowledge embedding is time-independent. In contrast, the entity embedding in tKG models usually evolves, which poses the challenge of aligning temporally relevant texts with entities. To this end, we propose to study enhancing temporal knowledge embedding with textual data in this paper. As an approach to this task, we propose Enhanced Temporal Knowledge Embeddings with Contextualized Language Representations (ECOLA), which takes the temporal aspect into account and injects textual information into temporal knowledge embedding. To evaluate ECOLA, we introduce three new datasets for training and evaluating ECOLA. Extensive experiments show that ECOLA significantly enhances temporal KG embedding models with up to 287% relative improvements regarding Hits@1 on the link prediction task. The code and models are publicly available on https://github.com/mayhugotong/ECOLA.
Search
Co-authors
- Zhen Han 2
- Volker Tresp 2
- Ruotong Liao 1
- Yao Zhang 1
- Zifeng Ding 1
- show all...