Xuan Zhu
2020
Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections
Yi-An Lai | Xuan Zhu | Yi Zhang | Mona Diab
Proceedings of the Twelfth Language Resources and Evaluation Conference
Yi-An Lai | Xuan Zhu | Yi Zhang | Mona Diab
Proceedings of the Twelfth Language Resources and Evaluation Conference
Summarizing data samples by quantitative measures has a long history, with descriptive statistics being a case in point. However, as natural language processing methods flourish, there are still insufficient characteristic metrics to describe a collection of texts in terms of the words, sentences, or paragraphs they comprise. In this work, we propose metrics of diversity, density, and homogeneity that quantitatively measure the dispersion, sparsity, and uniformity of a text collection. We conduct a series of simulations to verify that each metric holds desired properties and resonates with human intuitions. Experiments on real-world datasets demonstrate that the proposed characteristic metrics are highly correlated with text classification performance of a renowned model, BERT, which could inspire future applications.
2018
Quantifying Context Overlap for Training Word Embeddings
Yimeng Zhuang | Jinghui Xie | Yinhe Zheng | Xuan Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Yimeng Zhuang | Jinghui Xie | Yinhe Zheng | Xuan Zhu
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Most models for learning word embeddings are trained based on the context information of words, more precisely first order co-occurrence relations. In this paper, a metric is designed to estimate second order co-occurrence relations based on context overlap. The estimated values are further used as the augmented data to enhance the learning of word embeddings by joint training with existing neural word embedding models. Experimental results show that better word vectors can be obtained for word similarity tasks and some downstream NLP tasks by the enhanced approach.
2015
Learning Tag Embeddings and Tag-specific Composition Functions in Recursive Neural Network
Qiao Qian | Bo Tian | Minlie Huang | Yang Liu | Xuan Zhu | Xiaoyan Zhu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Qiao Qian | Bo Tian | Minlie Huang | Yang Liu | Xuan Zhu | Xiaoyan Zhu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)