Wenxiu Xie


pdf bib
Standard and Non-standard Adverbial Markers: a Diachronic Analysis in Modern Chinese Literature
John Lee | Fangqiong Zhan | Wenxiu Xie | Xiao Han | Chi-yin Chow | Kam-yiu Lam
Proceedings of the 7th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

This paper investigates the use of standard and non-standard adverbial markers in modern Chinese literature. In Chinese, adverbials can be derived from many adjectives, adverbs and verbs with the suffix “de”. The suffix has a standard and a non-standard written form, both of which are frequently used. Contrastive research on these two competing forms has mostly been qualitative or limited to small text samples. In this first large-scale quantitative study, we present statistics on 346 adverbial types from an 8-million-character text corpus drawn from Chinese literature in the 20th century. We present a semantic analysis of the verbs modified by adverbs with standard and non-standard markers, and a chronological analysis of marker choice among six prominent modern Chinese authors. We show that the non-standard form is more frequently used when the adverbial modifies an emotion verb. Further, we demonstrate that marker choice is correlated to text genre and register, as well as the writing style of the author.


pdf bib
中文糖尿病问题分类体系及标注语料库构建研究(The Construction of Question Taxonomy and An Annotated Chinese Corpus for Diabetes Question Classification)
Xiaobo Qian (钱晓波) | Wenxiu Xie (谢文秀) | Shaopei Long (龙绍沛) | Murong Lan (兰牧融) | Yuanyuan Mu (慕媛媛) | Tianyong Hao (郝天永)
Proceedings of the 21st Chinese National Conference on Computational Linguistics



pdf bib
Unsupervised Adverbial Identification in Modern Chinese Literature
Wenxiu Xie | John Lee | Fangqiong Zhan | Xiao Han | Chi-Yin Chow
Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature

In many languages, adverbials can be derived from words of various parts-of-speech. In Chinese, the derivation may be marked either with the standard adverbial marker DI, or the non-standard marker DE. Since DE also serves double duty as the attributive marker, accurate identification of adverbials requires disambiguation of its syntactic role. As parsers are trained predominantly on texts using the standard adverbial marker DI, they often fail to recognize adverbials suffixed with the non-standard DE. This paper addresses this problem with an unsupervised, rule-based approach for adverbial identification that utilizes dependency tree patterns. Experiment results show that this approach outperforms a masked language model baseline. We apply this approach to analyze standard and non-standard adverbial marker usage in modern Chinese literature.


pdf bib
A Counselling Corpus in Cantonese
John Lee | Tianyuan Cai | Wenxiu Xie | Lam Xing
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

Virtual agents are increasingly used for delivering health information in general, and mental health assistance in particular. This paper presents a corpus designed for training a virtual counsellor in Cantonese, a variety of Chinese. The corpus consists of a domain-independent subcorpus that supports small talk for rapport building with users, and a domain-specific subcorpus that provides material for a particular area of counselling. The former consists of ELIZA style responses, chitchat expressions, and a dataset of general dialog, all of which are reusable across counselling domains. The latter consists of example user inputs and appropriate chatbot replies relevant to the specific domain. In a case study, we created a chatbot with a domain-specific subcorpus that addressed 25 issues in test anxiety, with 436 inputs solicited from native speakers of Cantonese and 150 chatbot replies harvested from mental health websites. Preliminary evaluations show that Word Mover’s Distance achieved 56% accuracy in identifying the issue in user input, outperforming a number of baselines.


pdf bib
A Customizable Editor for Text Simplification
John Lee | Wenlong Zhao | Wenxiu Xie
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

We present a browser-based editor for simplifying English text. Given an input sentence, the editor performs both syntactic and lexical simplification. It splits a complex sentence into shorter ones, and suggests word substitutions in drop-down lists. The user can choose the best substitution from the list, undo any inappropriate splitting, and further edit the sentence as necessary. A significant novelty is that the system accepts a customized vocabulary list for a target reader population. It identifies all words in the text that do not belong to the list, and attempts to substitute them with words from the list, thus producing a text tailored for the targeted readers.