Dynamic Multistep Reasoning based on Video Scene Graph for Video Question Answering
Jianguo Mao | Wenbin Jiang | Xiangdong Wang | Zhifan Feng | Yajuan Lyu | Hong Liu | Yong Zhu
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Existing video question answering (video QA) models lack the capacity for deep video understanding and flexible multistep reasoning. We propose for video QA a novel model which performs dynamic multistep reasoning between questions and videos. It creates video semantic representation based on the video scene graph composed of semantic elements of the video and semantic relations among these elements. Then, it performs multistep reasoning for better answer decision between the representations of the question and the video, and dynamically integrate the reasoning results. Experiments show the significant advantage of the proposed model against previous methods in accuracy and interpretability. Against the existing state-of-the-art model, the proposed model dramatically improves more than 4%/3.1%/2% on the three widely used video QA datasets, MSRVTT-QA, MSRVTT multi-choice, and TGIF-QA, and displays better interpretability by backtracing along with the attention mechanisms to the video scene graphs.
Knowledge-Enhanced Named Entity Disambiguation for Short Text
Zhifan Feng | Qi Wang | Wenbin Jiang | Yajuan Lyu | Yong Zhu
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing
Named entity disambiguation is an important task that plays the role of bridge between text and knowledge. However, the performance of existing methods drops dramatically for short text, which is widely used in actual application scenarios, such as information retrieval and question answering. In this work, we propose a novel knowledge-enhanced method for named entity disambiguation. Considering the problem of information ambiguity and incompleteness for short text, two kinds of knowledge, factual knowledge graph and conceptual knowledge graph, are introduced to provide additional knowledge for the semantic matching between candidate entity and mention context. Our proposed method achieves significant improvement over previous methods on a large manually annotated short-text dataset, and also achieves the state-of-the-art on three standard datasets. The short-text dataset and the proposed model will be publicly available for research use.
- Wenbin Jiang 2
- Yajuan Lyu 2
- Yong Zhu 2
- Jianguo Mao 1
- Xiangdong Wang 1
- show all...