Weicai Yan

2025

Object detection is a core challenge in computer vision. Traditional methods primarily rely on intermediate modalities such as text, speech, or visual cues to interpret user intent, leading to inefficient and potentially distorted expressions of intent. Brain signals, particularly fMRI signals, emerge as a novel modality that can directly reflect user intent, eliminating ambiguities introduced during modality conversion. However, brain signal-based object detection still faces challenges in accuracy and robustness. To address these challenges, we present BrainLoc, a lightweight object detection model guided by fMRI signals. First, we employ a multi-modal alignment strategy that enhances fMRI signal feature extraction by incorporating various modalities including images and text. Second, we propose a cross-domain fusion module that promotes interaction between fMRI features and category features, improving the representation of category information in fMRI signals. Extensive experiments demonstrate that BrainLoc achieves state-of-the-art performance in brain signal-based object detection tasks, showing significant advantages in both accuracy and convenience.

pdf bib abs
Efficient Prompting for Continual Adaptation to Missing Modalities
Zirun Guo | Shulei Wang | Wang Lin | Weicai Yan | Yangyang Wu | Tao Jin
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Missing modality issues are common in real-world applications, arising from factors such as equipment failures and privacy concerns. When fine-tuning pre-trained models on downstream datasets with missing modalities, performance can degrade significantly. Current methods often aggregate various missing cases to train recovery modules or align multimodal features, resulting in suboptimal performance, high computational costs, and the risk of catastrophic forgetting in continual environments where data arrives sequentially. In this paper, we formulate the dynamic missing modality problem as a continual learning task and introduce the continual multimodal missing modality task. To address this challenge efficiently, we introduce three types of prompts: modality-specific, task-aware, and task-specific prompts. These prompts enable the model to learn intra-modality, inter-modality, intra-task, and inter-task features. Furthermore, we propose a contrastive task interaction strategy to explicitly learn prompts correlating different modalities. We conduct extensive experiments on three public datasets, where our method consistently outperforms state-of-the-art approaches.

Co-authors

Venues

findings1
naacl1

Fix author