Chak Yan Yeung


2021

pdf bib
Text Retrieval for Language Learners: Graded Vocabulary vs. Open Learner Model
John Lee | Chak Yan Yeung
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

A text retrieval system for language learning returns reading materials at the appropriate difficulty level for the user. The system typically maintains a learner model on the user’s vocabulary knowledge, and identifies texts that best fit the model. As the user’s language proficiency increases, model updates are necessary to retrieve texts with the corresponding lexical complexity. We investigate an open learner model that allows user modification of its content, and evaluate its effectiveness with respect to the amount of user update effort. We compare this model with the graded approach, in which the system returns texts at the optimal grade. When the user makes at least half of the expected updates to the open learner model, simulation results show that it outperforms the graded approach in retrieving texts that fit user preference for new-word density.

pdf bib
Character Set Construction for Chinese Language Learning
Chak Yan Yeung | John Lee
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

To promote efficient learning of Chinese characters, pedagogical materials may present not only a single character, but a set of characters that are related in meaning and in written form. This paper investigates automatic construction of these character sets. The proposed model represents a character as averaged word vectors of common words containing the character. It then identifies sets of characters with high semantic similarity through clustering. Human evaluation shows that this representation outperforms direct use of character embeddings, and that the resulting character sets capture distinct semantic ranges.

2020

pdf bib
A Dataset for Investigating the Impact of Feedback on Student Revision Outcome
Ildiko Pilan | John Lee | Chak Yan Yeung | Jonathan Webster
Proceedings of the Twelfth Language Resources and Evaluation Conference

We present an annotation scheme and a dataset of teacher feedback provided for texts written by non-native speakers of English. The dataset consists of student-written sentences in their original and revised versions with teacher feedback provided for the errors. Feedback appears both in the form of open-ended comments and error category tags. We focus on a specific error type, namely linking adverbial (e.g. however, moreover) errors. The dataset has been annotated for two aspects: (i) revision outcome establishing whether the re-written student sentence was correct and (ii) directness, indicating whether teachers provided explicitly the correction in their feedback. This dataset allows for studies around the characteristics of teacher feedback and how these influence students’ revision outcome. We describe the data preparation process and we present initial statistical investigations regarding the effect of different feedback characteristics on revision outcome. These show that open-ended comments and mitigating expressions appear in a higher proportion of successful revisions than unsuccessful ones, while directness and metalinguistic terms have no effect. Given that the use of this type of data is relatively unexplored in natural language processing (NLP) applications, we also report some observations and challenges when working with feedback data.

2019

pdf bib
Difficulty-aware Distractor Generation for Gap-Fill Items
Chak Yan Yeung | John Lee | Benjamin Tsou
Proceedings of the 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
Personalized Substitution Ranking for Lexical Simplification
John Lee | Chak Yan Yeung
Proceedings of the 12th International Conference on Natural Language Generation

A lexical simplification (LS) system substitutes difficult words in a text with simpler ones to make it easier for the user to understand. In the typical LS pipeline, the Substitution Ranking step determines the best substitution out of a set of candidates. Most current systems do not consider the user’s vocabulary proficiency, and always aim for the simplest candidate. This approach may overlook less-simple candidates that the user can understand, and that are semantically closer to the original word. We propose a personalized approach for Substitution Ranking to identify the candidate that is the closest synonym and is non-complex for the user. In experiments on learners of English at different proficiency levels, we show that this approach enhances the semantic faithfulness of the output, at the cost of a relatively small increase in the number of complex words.

2018

pdf bib
Personalizing Lexical Simplification
John Lee | Chak Yan Yeung
Proceedings of the 27th International Conference on Computational Linguistics

A lexical simplification (LS) system aims to substitute complex words with simple words in a text, while preserving its meaning and grammaticality. Despite individual users’ differences in vocabulary knowledge, current systems do not consider these variations; rather, they are trained to find one optimal substitution or ranked list of substitutions for all users. We evaluate the performance of a state-of-the-art LS system on individual learners of English at different proficiency levels, and measure the benefits of using complex word identification (CWI) models to personalize the system. Experimental results show that even a simple personalized CWI model, based on graded vocabulary lists, can help the system avoid some unnecessary simplifications and produce more readable output.

pdf bib
Personalized Text Retrieval for Learners of Chinese as a Foreign Language
Chak Yan Yeung | John Lee
Proceedings of the 27th International Conference on Computational Linguistics

This paper describes a personalized text retrieval algorithm that helps language learners select the most suitable reading material in terms of vocabulary complexity. The user first rates their knowledge of a small set of words, chosen by a graph-based active learning model. The system trains a complex word identification model on this set, and then applies the model to find texts that contain the desired proportion of new, challenging, and familiar vocabulary. In an evaluation on learners of Chinese as a foreign language, we show that this algorithm is effective in identifying simpler texts for low-proficiency learners, and more challenging ones for high-proficiency learners.

2017

pdf bib
Identifying Speakers and Listeners of Quoted Speech in Literary Works
Chak Yan Yeung | John Lee
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

We present the first study that evaluates both speaker and listener identification for direct speech in literary texts. Our approach consists of two steps: identification of speakers and listeners near the quotes, and dialogue chain segmentation. Evaluation results show that this approach outperforms a rule-based approach that is state-of-the-art on a corpus of literary texts.

2016

pdf bib
An Annotated Corpus of Direct Speech
John Lee | Chak Yan Yeung
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We propose a scheme for annotating direct speech in literary texts, based on the Text Encoding Initiative (TEI) and the coreference annotation guidelines from the Message Understanding Conference (MUC). The scheme encodes the speakers and listeners of utterances in a text, as well as the quotative verbs that reports the utterances. We measure inter-annotator agreement on this annotation task. We then present statistics on a manually annotated corpus that consists of books from the New Testament. Finally, we visualize the corpus as a conversational network.

2015

pdf bib
Automatic Detection of Sentence Fragments
Chak Yan Yeung | John Lee
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

2014

pdf bib
Automatic Detection of Comma Splices
John Lee | Chak Yan Yeung | Martin Chodorow
Proceedings of the 28th Pacific Asia Conference on Language, Information and Computing

2012

pdf bib
Extracting Networks of People and Places from Literary Texts
John Lee | Chak Yan Yeung
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation