Incorporating Uncertain Segmentation Information into Chinese NER for Social Media Text
Shengbin Jia | Ling Ding | Xiaojun Chen | Shijia E | Yang Xiang
Proceedings of the Eighth International Workshop on Natural Language Processing for Social Media

Chinese word segmentation is necessary to provide word-level information for Chinese named entity recognition (NER) systems. However, segmentation error propagation is a challenge for Chinese NER while processing colloquial data like social media text. In this paper, we propose a model (UIcwsNN) that specializes in identifying entities from Chinese social media text, especially by leveraging uncertain information of word segmentation. Such ambiguous information contains all the potential segmentation states of a sentence that provides a channel for the model to infer deep word-level characteristics. We propose a trilogy (i.e., Candidate Position Embedding => Position Selective Attention => Adaptive Word Convolution) to encode uncertain word segmentation information and acquire appropriate word-level representation. Experimental results on the social media corpus show that our model alleviates the segmentation error cascading trouble effectively, and achieves a significant performance improvement of 2% over previous state-of-the-art methods.


Naive Bayes and BiLSTM Ensemble for Discriminating between Mainland and Taiwan Variation of Mandarin Chinese
Li Yang | Yang Xiang
Proceedings of the Sixth Workshop on NLP for Similar Languages, Varieties and Dialects

Automatic dialect identification is a more challengingctask than language identification, as it requires the ability to discriminate between varieties of one language. In this paper, we propose an ensemble based system, which combines traditional machine learning models trained on bag of n-gram fetures, with deep learning models trained on word embeddings, to solve the Discriminating between Mainland and Taiwan Variation of Mandarin Chinese (DMT) shared task at VarDial 2019. Our experiments show that a character bigram-trigram combination based Naive Bayes is a very strong model for identifying varieties of Mandarin Chinense. Through further ensemble of Navie Bayes and BiLSTM, our system (team: itsalexyang) achived an macro-averaged F1 score of 0.8530 and 0.8687 in two tracks.


Incorporating Label Dependency for Answer Quality Tagging in Community Question Answering via CNN-LSTM-CRF
Yang Xiang | Xiaoqiang Zhou | Qingcai Chen | Zhihui Zheng | Buzhou Tang | Xiaolong Wang | Yang Qin
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In community question answering (cQA), the quality of answers are determined by the matching degree between question-answer pairs and the correlation among the answers. In this paper, we show that the dependency between the answer quality labels also plays a pivotal role. To validate the effectiveness of label dependency, we propose two neural network-based models, with different combination modes of Convolutional Neural Net-works, Long Short Term Memory and Conditional Random Fields. Extensive experi-ments are taken on the dataset released by the SemEval-2015 cQA shared task. The first model is a stacked ensemble of the networks. It achieves 58.96% on macro averaged F1, which improves the state-of-the-art neural network-based method by 2.82% and outper-forms the Top-1 system in the shared task by 1.77%. The second is a simple attention-based model whose input is the connection of the question and its corresponding answers. It produces promising results with 58.29% on overall F1 and gains the best performance on the Good and Bad categories.


Chinese Grammatical Error Diagnosis Using Ensemble Learning
Yang Xiang | Xiaolong Wang | Wenying Han | Qinghua Hong
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

ICRC-HIT: A Deep Learning based Comment Sequence Labeling System for Answer Selection Challenge
Xiaoqiang Zhou | Baotian Hu | Jiaxin Lin | Yang Xiang | Xiaolong Wang
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)


Problematic Situation Analysis and Automatic Recognition for Chinese Online Conversational System
Yang Xiang | Yaoyun Zhang | Xiaoqiang Zhou | Xiaolong Wang | Yang Qin
Proceedings of The Third CIPS-SIGHAN Joint Conference on Chinese Language Processing


Grammatical Error Correction Using Feature Selection and Confidence Tuning
Yang Xiang | Yaoyun Zhang | Xiaolong Wang | Chongqiang Wei | Wen Zheng | Xiaoqiang Zhou | Yuxiu Hu | Yang Qin
Proceedings of the Sixth International Joint Conference on Natural Language Processing

A Hybrid Model For Grammatical Error Correction
Yang Xiang | Bo Yuan | Yaoyun Zhang | Xiaolong Wang | Wen Zheng | Chongqiang Wei
Proceedings of the Seventeenth Conference on Computational Natural Language Learning: Shared Task


A Mixed Deterministic Model for Coreference Resolution
Bo Yuan | Qingcai Chen | Yang Xiang | Xiaolong Wang | Liping Ge | Zengjian Liu | Meng Liao | Xianbo Si
Joint Conference on EMNLP and CoNLL - Shared Task