Changwei Xu


2022

pdf bib
Understanding Attention for Vision-and-Language Tasks
Feiqi Cao | Soyeon Caren Han | Siqu Long | Changwei Xu | Josiah Poon
Proceedings of the 29th International Conference on Computational Linguistics

Attention mechanism has been used as an important component across Vision-and-Language(VL) tasks in order to bridge the semantic gap between visual and textual features. While attention has been widely used in VL tasks, it has not been examined the capability of different attention alignment calculation in bridging the semantic gap between visual and textual clues. In this research, we conduct a comprehensive analysis on understanding the role of attention alignment by looking into the attention score calculation methods and check how it actually represents the visual region’s and textual token’s significance for the global assessment. We also analyse the conditions which attention score calculation mechanism would be more (or less) interpretable, and which may impact the model performance on three different VL tasks, including visual question answering, text-to-image generation, text-and-image matching (both sentence and image retrieval). Our analysis is the first of its kind and provides useful insights of the importance of each attention alignment score calculation when applied at the training phase of VL tasks, commonly ignored in attention-based cross modal models, and/or pretrained models. Our code is available at: https://github.com/adlnlp/Attention_VL

2021

pdf bib
基于大规模语料库的《古籍汉字分级字表》研究(The Formulation of The graded Chinese character list of ancient books Based on Large-scale Corpus)
Changwei Xu (许长伟) | Minxuan Feng (冯敏萱) | Bin Li (李斌) | Yiguo Yuan (袁义国)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

《古籍汉字分级字表》是基于大规模古籍文本语料库、为辅助学习者古籍文献阅读而研制的分级字表。该字表填补了古籍字表研究成果的空缺,依据各汉字学习优先级别的不同,实现了古籍汉字的等级划分,目前收录一级字105个,二级字340个,三级字555个。本文介绍了该字表研制的主要依据和基本步骤,并将其与传统识字教材“三百千”及《现代汉语常用字表》进行比较,验证了其收字的合理性。该字表有助于学习者优先掌握古籍文本常用字,提升古籍阅读能力,从而促进中华优秀传统文化的继承与发展。

2020

pdf bib
Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model
Ning Cheng | Bin Li | Liming Xiao | Changwei Xu | Sijia Ge | Xingyue Hao | Minxuan Feng
Proceedings of LT4HALA 2020 - 1st Workshop on Language Technologies for Historical and Ancient Languages

The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition. Tasks such as lexical analysis need to be based on sentence segmentation because of the reason that a plenty of ancient books are not punctuated. However, step-by-step processing is prone to cause multi-level diffusion of errors. This paper designs and implements an integrated annotation system of sentence segmentation and lexical analysis. The BiLSTM-CRF neural network model is used to verify the generalization ability and the effect of sentence segmentation and lexical analysis on different label levels on four cross-age test sets. Research shows that the integration method adopted in ancient Chinese improves the F1-score of sentence segmentation, word segmentation and part of speech tagging. Based on the experimental results of each test set, the F1-score of sentence segmentation reached 78.95, with an average increase of 3.5%; the F1-score of word segmentation reached 85.73%, with an average increase of 0.18%; and the F1-score of part-of-speech tagging reached 72.65, with an average increase of 0.35%.