Yin Wu
2024
MAR: Matching-Augmented Reasoning for Enhancing Visual-based Entity Question Answering
Zhengxuan Zhang
|
Yin Wu
|
Yuyu Luo
|
Nan Tang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
A multimodal large language model MLLMs may struggle with answering visual-based (personal) entity questions (VEQA), such as ”who is A?” or ”who is A that B is talking to?” for various reasons, e.g., the absence of the name of A in the caption or the inability of MLLMs to recognize A, particularly for less common entities. Furthermore, even if the MLLMs can identify A, it may refrain from answering due to privacy concerns. In this paper, we introduce a novel method called Matching-Augmented Reasoning (MAR) to enhance VEQA. Given a collection of visual objects with captions, MAR preprocesses each object individually, identifying faces, names, and their alignments within the object. It encodes this information and stores their vector representations in vector databases. When handling VEQA, MAR retrieves matching faces and names and organizes these entities into a matching graph. MAR then derives the answer to the query by reasoning over this matching graph. Extensive experiments show that MAR significantly improves VEQA compared with the state-of-the-art methods using MLLMs.
2023
Negation Scope Refinement via Boundary Shift Loss
Yin Wu
|
Aixin Sun
Findings of the Association for Computational Linguistics: ACL 2023
Negation in natural language may affect many NLP applications, e.g., information extraction and sentiment analysis. The key sub-task of negation detection is negation scope resolution which aims to extract the portion of a sentence that is being negated by a negation cue (e.g., keyword “not” and never”) in the sentence. Due to the long spans, existing methods tend to make wrong predictions around the scope boundaries. In this paper, we propose a simple yet effective model named R-BSL which engages the Boundary Shift Loss to refine the predicted boundary. On multiple benchmark datasets, we show that the extremely simple R-BSL achieves best results.