Yuhai Lu
2023
Mulan: A Multi-Level Alignment Model for Video Question Answering
Yu Fu
|
Cong Cao
|
Yuling Yang
|
Yuhai Lu
|
Fangfang Yuan
|
Dakui Wang
|
Yanbing Liu
Findings of the Association for Computational Linguistics: EMNLP 2023
Video Question Answering (VideoQA) aims to answer questions about the visual content of a video. Current methods mainly focus on improving joint representations of video and text. However, these methods pay little attention to the fine-grained semantic interaction between video and text. In this paper, we propose Mulan: a Multi-Level Alignment Model for Video Question Answering, which establishes alignment between visual and textual modalities at the object-level, frame-level, and video-level. Specifically, for object-level alignment, we propose a mask-guided visual feature encoding method and a visual-guided text description method to learn fine-grained spatial information. For frame-level alignment, we introduce the use of visual features from individual frames, combined with a caption generator, to learn overall spatial information within the scene. For video-level alignment, we propose an expandable ordinal prompt for textual descriptions, combined with visual features, to learn temporal information. Experimental results show that our method outperforms the state-of-the-art methods, even when utilizing the smallest amount of extra visual-language pre-training data and a reduced number of trainable parameters.
2021
TEBNER: Domain Specific Named Entity Recognition with Type Expanded Boundary-aware Network
Zheng Fang
|
Yanan Cao
|
Tai Li
|
Ruipeng Jia
|
Fang Fang
|
Yanmin Shang
|
Yuhai Lu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
To alleviate label scarcity in Named Entity Recognition (NER) task, distantly supervised NER methods are widely applied to automatically label data and identify entities. Although the human effort is reduced, the generated incomplete and noisy annotations pose new challenges for learning effective neural models. In this paper, we propose a novel dictionary extension method which extracts new entities through the type expanded model. Moreover, we design a multi-granularity boundary-aware network which detects entity boundaries from both local and global perspectives. We conduct experiments on different types of datasets, the results show that our model outperforms previous state-of-the-art distantly supervised systems and even surpasses the supervised models.