Jie Chi


pdf bib
Analyzing the Role of Part-of-Speech in Code-Switching: A Corpus-Based Study
Jie Chi | Peter Bell
Findings of the Association for Computational Linguistics: EACL 2024

Code-switching (CS) is a common linguistic phenomenon wherein speakers fluidly transition between languages in conversation. While the cognitive processes driving CS remain a complex domain, earlier investigations have shed light on its multifaceted triggers. This study delves into the influence of Part-of-Speech (POS) on the propensity of bilinguals to engage in CS, employing a comprehensive analysis of Spanish-English and Mandarin-English corpora. Compared with prior research, our findings not only affirm the existence of a statistically significant connection between POS and the likelihood of CS across language pairs, but notably find this relationship exhibits its maximum strength in proximity to CS instances, progressively diminishing as tokens distance themselves from these CS points.


pdf bib
Improving Code-switched ASR with Linguistic Information
Jie Chi | Peter Bell
Proceedings of the 29th International Conference on Computational Linguistics

This paper seeks to improve the performance of automatic speech recognition (ASR) systems operating on code-switched speech. Code-switching refers to the alternation of languages within a conversation, a phenomenon that is of increasing importance considering the rapid rise in the number of bilingual speakers in the world. It is particularly challenging for ASR owing to the relative scarcity of code-switching speech and text data, even when the individual languages are themselves well-resourced. This paper proposes to overcome this challenge by applying linguistic theories in order to generate more realistic code-switching text, necessary for language modelling in ASR. Working with English-Spanish code-switching, we find that Equivalence Constraint theory and part-of-speech labelling are particularly helpful for text generation, and bring 2% improvement to ASR performance.


pdf bib
Querent Intent in Multi-Sentence Questions
Laurie Burchell | Jie Chi | Tom Hosking | Nina Markl | Bonnie Webber
Proceedings of the 14th Linguistic Annotation Workshop

Multi-sentence questions (MSQs) are sequences of questions connected by relations which, unlike sequences of standalone questions, need to be answered as a unit. Following Rhetorical Structure Theory (RST), we recognise that different “question discourse relations” between the subparts of MSQs reflect different speaker intents, and consequently elicit different answering strategies. Correctly identifying these relations is therefore a crucial step in automatically answering MSQs. We identify five different types of MSQs in English, and define five novel relations to describe them. We extract over 162,000 MSQs from Stack Exchange to enable future research. Finally, we implement a high-precision baseline classifier based on surface features.