Nancy Chen

Also published as: Nancy F. Chen


2021

pdf bib
Velocidapter: Task-oriented Dialogue Comprehension Modeling Pairing Synthetic Text Generation with Domain Adaptation
Ibrahim Taha Aksu | Zhengyuan Liu | Min-Yen Kan | Nancy Chen
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

We introduce a synthetic dialogue generation framework, Velocidapter, which addresses the corpus availability problem for dialogue comprehension. Velocidapter augments datasets by simulating synthetic conversations for a task-oriented dialogue domain, requiring a small amount of bootstrapping work for each new domain. We evaluate the efficacy of our framework on a task-oriented dialogue comprehension dataset, MRCWOZ, which we curate by annotating questions for slots in the restaurant, taxi, and hotel domains of the MultiWOZ 2.2 dataset (Zang et al., 2020). We run experiments within a low-resource setting, where we pretrain a model on SQuAD, fine-tuning it on either a small original data or on the synthetic data generated by our framework. Velocidapter shows significant improvements using both the transformer-based BERTBase and BiDAF as base models. We further show that the framework is easy to use by novice users and conclude that Velocidapter can greatly help training over task-oriented dialogues, especially for low-resourced emerging domains.

pdf bib
Coreference-Aware Dialogue Summarization
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Summarizing conversations via neural approaches has been gaining research traction lately, yet it is still challenging to obtain practical solutions. Examples of such challenges include unstructured information exchange in dialogues, informal interactions between speakers, and dynamic role changes of speakers as the dialogue evolves. Many of such challenges result in complex coreference links. Therefore, in this work, we investigate different approaches to explicitly incorporate coreference information in neural abstractive dialogue summarization models to tackle the aforementioned challenges. Experimental results show that the proposed approaches achieve state-of-the-art performance, implying it is useful to utilize coreference information in dialogue summarization. Evaluation results on factual correctness suggest such coreference-aware models are better at tracing the information flow among interlocutors and associating accurate status/actions with the corresponding interlocutors and person mentions.

pdf bib
Controllable Neural Dialogue Summarization with Personal Named Entity Planning
Zhengyuan Liu | Nancy Chen
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose a controllable neural generation framework that can flexibly guide dialogue summarization with personal named entity planning. The conditional sequences are modulated to decide what types of information or what perspective to focus on when forming summaries to tackle the under-constrained problem in summarization tasks. This framework supports two types of use cases: (1) Comprehensive Perspective, which is a general-purpose case with no user-preference specified, considering summary points from all conversational interlocutors and all mentioned persons; (2) Focus Perspective, positioning the summary based on a user-specified personal named entity, which could be one of the interlocutors or one of the persons mentioned in the conversation. During training, we exploit occurrence planning of personal named entities and coreference information to improve temporal coherence and to minimize hallucination in neural generation. Experimental results show that our proposed framework generates fluent and factually consistent summaries under various planning controls using both objective metrics and human evaluations.

pdf bib
Analyzing Code Embeddings for Coding Clinical Narratives
Wei Shi | Jiewen Wu | Xiwen Yang | Nancy Chen | Ivan Ho Mien | Jung-Jae Kim | Pavitra Krishnaswamy
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Improving Multi-Party Dialogue Discourse Parsing via Domain Integration
Zhengyuan Liu | Nancy Chen
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

While multi-party conversations are often less structured than monologues and documents, they are implicitly organized by semantic level correlations across the interactive turns, and dialogue discourse analysis can be applied to predict the dependency structure and relations between the elementary discourse units, and provide feature-rich structural information for downstream tasks. However, the existing corpora with dialogue discourse annotation are collected from specific domains with limited sample sizes, rendering the performance of data-driven approaches poor on incoming dialogues without any domain adaptation. In this paper, we first introduce a Transformer-based parser, and assess its cross-domain performance. We next adopt three methods to gain domain integration from both data and language modeling perspectives to improve the generalization capability. Empirical results show that the neural parser can benefit from our proposed methods, and performs better on cross-domain dialogue samples.

pdf bib
DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 2nd Workshop on Computational Approaches to Discourse

Text discourse parsing weighs importantly in understanding information flow and argumentative structure in natural language, making it beneficial for downstream tasks. While previous work significantly improves the performance of RST discourse parsing, they are not readily applicable to practical use cases: (1) EDU segmentation is not integrated into most existing tree parsing frameworks, thus it is not straightforward to apply such models on newly-coming data. (2) Most parsers cannot be used in multilingual scenarios, because they are developed only in English. (3) Parsers trained from single-domain treebanks do not generalize well on out-of-domain inputs. In this work, we propose a document-level multilingual RST discourse parsing framework, which conducts EDU segmentation and discourse tree parsing jointly. Moreover, we propose a cross-translation augmentation strategy to enable the framework to support multilingual parsing and improve its domain generality. Experimental results show that our model achieves state-of-the-art performance on document-level multilingual RST parsing in all sub-tasks.

2020

pdf bib
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le | Doyen Sahoo | Nancy Chen | Steven C.H. Hoi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Video-grounded dialogues are very challenging due to (i) the complexity of videos which contain both spatial and temporal variations, and (ii) the complexity of user utterances which query different segments and/or different objects in videos over multiple dialogue turns. However, existing approaches to video-grounded dialogues often focus on superficial temporal-level visual cues, but neglect more fine-grained spatial signals from videos. To address this drawback, we proposed Bi-directional Spatio-Temporal Learning (BiST), a vision-language neural framework for high-resolution queries in videos based on textual cues. Specifically, our approach not only exploits both spatial and temporal-level information, but also learns dynamic information diffusion between the two feature spaces through spatial-to-temporal and temporal-to-spatial reasoning. The bidirectional strategy aims to tackle the evolving semantics of user queries in the dialogue setting. The retrieved visual cues are used as contextual information to construct relevant responses to the users. Our empirical results and comprehensive qualitative analysis show that BiST achieves competitive performance and generates reasonable responses on a large-scale AVSD benchmark. We also adapt our BiST models to the Video QA setting, and substantially outperform prior approaches on the TGIF-QA benchmark.

pdf bib
UniConv: A Unified Conversational Neural Architecture for Multi-domain Task-oriented Dialogues
Hung Le | Doyen Sahoo | Chenghao Liu | Nancy Chen | Steven C.H. Hoi
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Building an end-to-end conversational agent for multi-domain task-oriented dialogues has been an open challenge for two main reasons. First, tracking dialogue states of multiple domains is non-trivial as the dialogue agent must obtain complete states from all relevant domains, some of which might have shared slots among domains as well as unique slots specifically for one domain only. Second, the dialogue agent must also process various types of information across domains, including dialogue context, dialogue states, and database, to generate natural responses to users. Unlike the existing approaches that are often designed to train each module separately, we propose “UniConv” - a novel unified neural architecture for end-to-end conversational systems in multi-domain task-oriented dialogues, which is designed to jointly train (i) a Bi-level State Tracker which tracks dialogue states by learning signals at both slot and domain level independently, and (ii) a Joint Dialogue Act and Response Generator which incorporates information from various input components and models dialogue acts and target responses simultaneously. We conduct comprehensive experiments in dialogue state tracking, context-to-text, and end-to-end settings on the MultiWOZ2.1 benchmark, achieving superior performance over competitive baselines.

pdf bib
Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences
Yi Tay | Donovan Ong | Jie Fu | Alvin Chan | Nancy Chen | Anh Tuan Luu | Chris Pal
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Understanding human preferences, along with cultural and social nuances, lives at the heart of natural language understanding. Concretely, we present a new task and corpus for learning alignments between machine and human preferences. Our newly introduced problem is concerned with predicting the preferable options from two sentences describing scenarios that may involve social and cultural situations. Our problem is framed as a natural language inference task with crowd-sourced preference votes by human players, obtained from a gamified voting platform. We benchmark several state-of-the-art neural models, along with BERT and friends on this task. Our experimental results show that current state-of-the-art NLP models still leave much room for improvement.

pdf bib
Multilingual Neural RST Discourse Parsing
Zhengyuan Liu | Ke Shi | Nancy Chen
Proceedings of the 28th International Conference on Computational Linguistics

Text discourse parsing plays an important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) utilizing multilingual vector representations; and (2) adopting segment-level translation of the source content. Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.

pdf bib
Uncertainty Modeling for Machine Comprehension Systems using Efficient Bayesian Neural Networks
Zhengyuan Liu | Pavitra Krishnaswamy | Ai Ti Aw | Nancy Chen
Proceedings of the 28th International Conference on Computational Linguistics: Industry Track

While neural approaches have achieved significant improvement in machine comprehension tasks, models often work as a black-box, resulting in lower interpretability, which requires special attention in domains such as healthcare or education. Quantifying uncertainty helps pave the way towards more interpretable neural networks. In classification and regression tasks, Bayesian neural networks have been effective in estimating model uncertainty. However, inference time increases linearly due to the required sampling process in Bayesian neural networks. Thus speed becomes a bottleneck in tasks with high system complexity such as question-answering or dialogue generation. In this work, we propose a hybrid neural architecture to quantify model uncertainty using Bayesian weight approximation but boosts up the inference speed by 80% relative at test time, and apply it for a clinical dialogue comprehension task. The proposed approach is also used to enable active learning so that an updated model can be trained more optimally with new incoming data by selecting samples that are not well-represented in the current training scheme.

pdf bib
Conditional Neural Generation using Sub-Aspect Functions for Extractive News Summarization
Zhengyuan Liu | Ke Shi | Nancy Chen
Findings of the Association for Computational Linguistics: EMNLP 2020

Much progress has been made in text summarization, fueled by neural architectures using large-scale training corpora. However, in the news domain, neural models easily overfit by leveraging position-related features due to the prevalence of the inverted pyramid writing style. In addition, there is an unmet need to generate a variety of summaries for different users. In this paper, we propose a neural framework that can flexibly control summary generation by introducing a set of sub-aspect functions (i.e. importance, diversity, position). These sub-aspect functions are regulated by a set of control codes to decide which sub-aspect to focus on during summary generation. We demonstrate that extracted summaries with minimal position bias is comparable with those generated by standard models that take advantage of position preference. We also show that news summaries generated with a focus on diversity can be more preferred by human raters. These results suggest that a more flexible neural summarization framework providing more control options could be desirable in tailoring to different user preferences, which is useful since it is often impractical to articulate such preferences for different applications a priori.

2019

pdf bib
Reading Turn by Turn: Hierarchical Attention Architecture for Spoken Dialogue Comprehension
Zhengyuan Liu | Nancy Chen
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Comprehending multi-turn spoken conversations is an emerging research area, presenting challenges different from reading comprehension of passages due to the interactive nature of information exchange from at least two speakers. Unlike passages, where sentences are often the default semantic modeling unit, in multi-turn conversations, a turn is a topically coherent unit embodied with immediately relevant context, making it a linguistically intuitive segment for computationally modeling verbal interactions. Therefore, in this work, we propose a hierarchical attention neural network architecture, combining turn-level and word-level attention mechanisms, to improve spoken dialogue comprehension performance. Experiments are conducted on a multi-turn conversation dataset, where nurses inquire and discuss symptom information with patients. We empirically show that the proposed approach outperforms standard attention baselines, achieves more efficient learning outcomes, and is more robust to lengthy and out-of-distribution test samples.

pdf bib
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
Hung Le | Doyen Sahoo | Nancy Chen | Steven Hoi
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Developing Video-Grounded Dialogue Systems (VGDS), where a dialogue is conducted based on visual and audio aspects of a given video, is significantly more challenging than traditional image or text-grounded dialogue systems because (1) feature space of videos span across multiple picture frames, making it difficult to obtain semantic information; and (2) a dialogue agent must perceive and process information from different modalities (audio, video, caption, etc.) to obtain a comprehensive understanding. Most existing work is based on RNNs and sequence-to-sequence architectures, which are not very effective for capturing complex long-term dependencies (like in videos). To overcome this, we propose Multimodal Transformer Networks (MTN) to encode videos and incorporate information from different modalities. We also propose query-aware attention through an auto-encoder to extract query-aware features from non-text modalities. We develop a training procedure to simulate token-level decoding to improve the quality of generated responses during inference. We get state of the art performance on Dialogue System Technology Challenge 7 (DSTC7). Our model also generalizes to another multimodal visual-grounded dialogue task, and obtains promising performance.

pdf bib
Set to Ordered Text: Generating Discharge Instructions from Medical Billing Codes
Litton J Kurisinkel | Nancy Chen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We present set to ordered text, a natural language generation task applied to automatically generating discharge instructions from admission ICD (International Classification of Diseases) codes. This task differs from other natural language generation tasks in the following ways: (1) The input is a set of identifiable entities (ICD codes) where the relations between individual entity are not explicitly specified. (2) The output text is not a narrative description (e.g. news articles) composed from the input. Rather, inferences are made from the input (symptoms specified in ICD codes) to generate the output (instructions). (3) There is an optimal order in which each sentence (instruction) should appear in the output. Unlike most other tasks, neither the input (ICD codes) nor their corresponding symptoms appear in the output, so the ordering of the output instructions needs to be learned in an unsupervised fashion. Based on clinical intuition, we hypothesize that each instruction in the output is mapped to a subset of ICD codes specified in the input. We propose a neural architecture that jointly models (a) subset selection: choosing relevant subsets from a set of input entities; (b) content ordering: learning the order of instructions; and (c) text generation: representing the instructions corresponding to the selected subsets in natural language. In addition, we penalize redundancy during beam search to improve tractability for long text generation. Our model outperforms baseline models in BLEU scores and human evaluation. We plan to extend this work to other tasks such as recipe generation from ingredients.

pdf bib
Exploiting Discourse-Level Segmentation for Extractive Summarization
Zhengyuan Liu | Nancy Chen
Proceedings of the 2nd Workshop on New Frontiers in Summarization

Extractive summarization selects and concatenates the most essential text spans in a document. Most, if not all, neural approaches use sentences as the elementary unit to select content for summarization. However, semantic segments containing supplementary information or descriptive details are often nonessential in the generated summaries. In this work, we propose to exploit discourse-level segmentation as a finer-grained means to more precisely pinpoint the core content in a document. We investigate how the sub-sentential segmentation improves extractive summarization performance when content selection is modeled through two basic neural network architectures and a deep bi-directional transformer. Experiment results on the CNN/Daily Mail dataset show that discourse-level segmentation is effective in both cases. In particular, we achieve state-of-the-art performance when discourse-level segmentation is combined with our adapted contextual representation model.

bib
Acoustic Characterization of Singaporean Children’s English: Comparisons to American and British Counterparts
Yuling Gu | Nancy Chen
Proceedings of the 2019 Workshop on Widening NLP

We investigate English pronunciation patterns in Singaporean children in relation to their American and British counterparts by conducting archetypal analysis on selected vowel pairs. Given that Singapore adopts British English as the institutional standard, one might expect Singaporean children to follow British pronunciation patterns, but we observe that Singaporean children also present similar patterns to Americans for TRAP-BATH spilt vowels: (1) British and Singaporean children both produce these vowels with a relatively lowered tongue height. (2) These vowels are more fronted for American and Singaporean children (p < 0.001). In addition, when comparing /æ/ and /ε/ productions, British speakers show the clearest distinction between the two vowels; Singaporean and American speakers exhibit a higher and more fronted tongue position for /æ/ (p < 0.001), causing /æ/ to be acoustically more similar to /ε/.

bib
Isolating the Effects of Modeling Recursive Structures: A Case Study in Pronunciation Prediction of Chinese Characters
Minh Nguyen | Gia H Ngo | Nancy Chen
Proceedings of the 2019 Workshop on Widening NLP

Finding that explicitly modeling structures leads to better generalization, we consider the task of predicting Cantonese pronunciations of logographs (Chinese characters) using logographs’ recursive structures. This task is a suitable case study for two reasons. First, logographs’ pronunciations depend on structures (i.e. the hierarchies of sub-units in logographs) Second, the quality of logographic structures is consistent since the structures are constructed automatically using a set of rules. Thus, this task is less affected by confounds such as varying quality between annotators. Empirical results show that modeling structures explicitly using treeLSTM outperforms LSTM baseline, reducing prediction error by 6.0% relative.

pdf bib
Fast Prototyping a Dialogue Comprehension System for Nurse-Patient Conversations on Symptom Monitoring
Zhengyuan Liu | Hazel Lim | Nur Farah Ain Suhaimi | Shao Chuen Tong | Sharon Ong | Angela Ng | Sheldon Lee | Michael R. Macdonald | Savitha Ramasamy | Pavitra Krishnaswamy | Wai Leng Chow | Nancy F. Chen
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)

Data for human-human spoken dialogues for research and development are currently very limited in quantity, variety, and sources; such data are even scarcer in healthcare. In this work, we investigate fast prototyping of a dialogue comprehension system by leveraging on minimal nurse-to-patient conversations. We propose a framework inspired by nurse-initiated clinical symptom monitoring conversations to construct a simulated human-human dialogue dataset, embodying linguistic characteristics of spoken interactions like thinking aloud, self-contradiction, and topic drift. We then adopt an established bidirectional attention pointer network on this simulated dataset, achieving more than 80% F1 score on a held-out test set from real-world nurse-to-patient conversations. The ability to automatically comprehend conversations in the healthcare domain by exploiting only limited data has implications for improving clinical workflows through red flag symptom detection and triaging capabilities. We demonstrate the feasibility for efficient and effective extraction, retrieval and comprehension of symptom checking information discussed in multi-turn human-human spoken conversations.

2018

pdf bib
Proceedings of the Seventh Named Entities Workshop
Nancy Chen | Rafael E. Banchs | Xiangyu Duan | Min Zhang | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

pdf bib
Attention-based Semantic Priming for Slot-filling
Jiewen Wu | Rafael E. Banchs | Luis Fernando D’Haro | Pavitra Krishnaswamy | Nancy Chen
Proceedings of the Seventh Named Entities Workshop

The problem of sequence labelling in language understanding would benefit from approaches inspired by semantic priming phenomena. We propose that an attention-based RNN architecture can be used to simulate semantic priming for sequence labelling. Specifically, we employ pre-trained word embeddings to characterize the semantic relationship between utterances and labels. We validate the approach using varying sizes of the ATIS and MEDIA datasets, and show up to 1.4-1.9% improvement in F1 score. The developed framework can enable more explainable and generalizable spoken language understanding systems.

pdf bib
NEWS 2018 Whitepaper
Nancy Chen | Xiangyu Duan | Min Zhang | Rafael E. Banchs | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

Transliteration is defined as phonetic translation of names across languages. Transliteration of Named Entities (NEs) is necessary in many applications, such as machine translation, corpus alignment, cross-language IR, information extraction and automatic lexicon acquisition. All such systems call for high-performance transliteration, which is the focus of shared task in the NEWS 2018 workshop. The objective of the shared task is to promote machine transliteration research by providing a common benchmarking platform for the community to evaluate the state-of-the-art technologies.

pdf bib
Report of NEWS 2018 Named Entity Transliteration Shared Task
Nancy Chen | Rafael E. Banchs | Min Zhang | Xiangyu Duan | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

This report presents the results from the Named Entity Transliteration Shared Task conducted as part of The Seventh Named Entities Workshop (NEWS 2018) held at ACL 2018 in Melbourne, Australia. Similar to previous editions of NEWS, the Shared Task featured 19 tasks on proper name transliteration, including 13 different languages and two different Japanese scripts. A total of 6 teams from 8 different institutions participated in the evaluation, submitting 424 runs, involving different transliteration methodologies. Four performance metrics were used to report the evaluation results. The NEWS shared task on machine transliteration has successfully achieved its objectives by providing a common ground for the research community to conduct comparative evaluations of state-of-the-art technologies that will benefit the future research and development in this area.

pdf bib
Statistical Machine Transliteration Baselines for NEWS 2018
Snigdha Singhania | Minh Nguyen | Gia H. Ngo | Nancy Chen
Proceedings of the Seventh Named Entities Workshop

This paper reports the results of our trans-literation experiments conducted on NEWS 2018 Shared Task dataset. We focus on creating the baseline systems trained using two open-source, statistical transliteration tools, namely Sequitur and Moses. We discuss the pre-processing steps performed on this dataset for both the systems. We also provide a re-ranking system which uses top hypotheses from Sequitur and Moses to create a consolidated list of transliterations. The results obtained from each of these models can be used to present a good starting point for the participating teams.

pdf bib
Multimodal neural pronunciation modeling for spoken languages with logographic origin
Minh Nguyen | Gia H. Ngo | Nancy Chen
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Graphemes of most languages encode pronunciation, though some are more explicit than others. Languages like Spanish have a straightforward mapping between its graphemes and phonemes, while this mapping is more convoluted for languages like English. Spoken languages such as Cantonese present even more challenges in pronunciation modeling: (1) they do not have a standard written form, (2) the closest graphemic origins are logographic Han characters, of which only a subset of these logographic characters implicitly encodes pronunciation. In this work, we propose a multimodal approach to predict the pronunciation of Cantonese logographic characters, using neural networks with a geometric representation of logographs and pronunciation of cognates in historically related languages. The proposed framework improves performance by 18.1% and 25.0% respective to unimodal and multimodal baselines.

2016

pdf bib
Regulating Orthography-Phonology Relationship for English to Thai Transliteration
Binh Minh Nguyen | Hoang Gia Ngo | Nancy F. Chen
Proceedings of the Sixth Named Entity Workshop

pdf bib
Clustering-based Phonetic Projection in Mismatched Crowdsourcing Channels for Low-resourced ASR
Wenda Chen | Mark Hasegawa-Johnson | Nancy Chen | Preethi Jyothi | Lav Varshney
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

Acquiring labeled speech for low-resource languages is a difficult task in the absence of native speakers of the language. One solution to this problem involves collecting speech transcriptions from crowd workers who are foreign or non-native speakers of a given target language. From these mismatched transcriptions, one can derive probabilistic phone transcriptions that are defined over the set of all target language phones using a noisy channel model. This paper extends prior work on deriving probabilistic transcriptions (PTs) from mismatched transcriptions by 1) modelling multilingual channels and 2) introducing a clustering-based phonetic mapping technique to improve the quality of PTs. Mismatched crowdsourcing for multilingual channels has certain properties of projection mapping, e.g., it can be interpreted as a clustering based on singular value decomposition of the segment alignments. To this end, we explore the use of distinctive feature weights, lexical tone confusions, and a two-step clustering algorithm to learn projections of phoneme segments from mismatched multilingual transcriber languages to the target language. We evaluate our techniques using mismatched transcriptions for Cantonese speech acquired from native English and Mandarin speakers. We observe a 5-9% relative reduction in phone error rate for the predicted Cantonese phone transcriptions using our proposed techniques compared with the previous PT method.