Xiao Luo


2024

pdf bib
DEMO: A Statistical Perspective for Efficient Image-Text Matching
Fan Zhang | Xian-Sheng Hua | Chong Chen | Xiao Luo
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Image-text matching has been a long-standing problem, which seeks to connect vision and language through semantic understanding. Due to the capability to manage large-scale raw data, unsupervised hashing-based approaches have gained prominence recently. They typically construct a semantic similarity structure using the natural distance, which subsequently guides the optimization of the hashing network. However, the similarity structure could be biased at the boundaries of semantic distributions, causing error accumulation during sequential optimization. To tackle this, we introduce a novel hashing approach termed Distribution-based Structure Mining with Consistency Learning (DEMO) for efficient image-text matching. From a statistical view, DEMO characterizes each image using multiple augmented views, which are considered as samples drawn from its intrinsic semantic distribution. Then, we employ a non-parametric distribution divergence to ensure a robust and precise similarity structure. In addition, we introduce collaborative consistency learning which not only preserves the similarity structure in the Hamming space but also encourages consistency between retrieval distribution from different directions in a self-supervised manner. Extensive experiments on several widely used datasets demonstrate that DEMO achieves superior performance compared with various state-of-the-art methods.

2022

pdf bib
AGRank: Augmented Graph-based Unsupervised Keyphrase Extraction
Haoran Ding | Xiao Luo
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Keywords or keyphrases are often used to highlight a document’s domains or main topics. Unsupervised keyphrase extraction (UKE) has always been highly anticipated because no labeled data is needed to train a model. This paper proposes an augmented graph-based unsupervised model to identify keyphrases from a document by integrating graph and deep learning methods. The proposed model utilizes mutual attention extracted from the pre-trained BERT model to build the candidate graph and augments the graph with global and local context nodes to improve the performance. The proposed model is evaluated on four publicly available datasets against thirteen UKE baselines. The results show that the proposed model is an effective and robust UKE model for long and short documents. Our source code is available on GitHub.

2021

pdf bib
AttentionRank: Unsupervised Keyphrase Extraction using Self and Cross Attentions
Haoran Ding | Xiao Luo
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Keyword or keyphrase extraction is to identify words or phrases presenting the main topics of a document. This paper proposes the AttentionRank, a hybrid attention model, to identify keyphrases from a document in an unsupervised manner. AttentionRank calculates self-attention and cross-attention using a pre-trained language model. The self-attention is designed to determine the importance of a candidate within the context of a sentence. The cross-attention is calculated to identify the semantic relevance between a candidate and sentences within a document. We evaluate the AttentionRank on three publicly available datasets against seven baselines. The results show that the AttentionRank is an effective and robust unsupervised keyphrase extraction model on both long and short documents. Source code is available on Github.

2018

pdf bib
A Study on the Korean and Chinese Pronunciation of Chinese Characters and Learning Korean as a Second Language
Xiao Luo | Yike Yang | Jing Sun
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation

2002

pdf bib
Covering Ambiguity Resolution in Chinese Word Segmentation Based on Contextual Information
Xiao Luo | Maosong Sun | Benjamin K. Tsou
COLING 2002: The 19th International Conference on Computational Linguistics