Carol Nichols


2006

pdf bib
Corpus Variations for Translation Lexicon Induction
Rebecca Hwa | Carol Nichols | Khalil Sima’an
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Lexical mappings (word translations) between languages are an invaluable resource for multilingual processing. While the problem of extracting lexical mappings from parallel corpora is well-studied, the task is more challenging when the language samples are from non-parallel corpora. The goal of this work is to investigate one such scenario: finding lexical mappings between dialects of a diglossic language, in which people conduct their written communications in a prestigious formal dialect, but they communicate verbally in a colloquial dialect. Because the two dialects serve different socio-linguistic functions, parallel corpora do not naturally exist between them. An example of a diglossic dialect pair is Modern Standard Arabic (MSA) and Levantine Arabic. In this paper, we evaluate the applicability of a standard algorithm for inducing lexical mappings between comparable corpora (Rapp, 1999) to such diglossic corpora pairs. The focus of the paper is an in-depth error analysis, exploring the notion of relatedness in diglossic corpora and scrutinizing the effects of various dimensions of relatedness (such as mode, topic, style, and statistics) on the quality of the resulting translation lexicon.

2005

pdf bib
Word Alignment and Cross-Lingual Resource Acquisition
Carol Nichols | Rebecca Hwa
Proceedings of the ACL Interactive Poster and Demonstration Sessions