Elizabeth Baran
2012
Annotating dropped pronouns in Chinese newswire text
Elizabeth Baran
|
Yaqin Yang
|
Nianwen Xue
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We propose an annotation framework to explicitly identify dropped subject pronouns in Chinese. We acknowledge and specify 10 concrete pronouns that exist as words in Chinese and 4 abstract pronouns that do not correspond to Chinese words, but that are recognized conceptually, to native Chinese speakers. These abstract pronouns are identified as """"unspecified"""", """"pleonastic"""", """"event"""", and """"existential"""" and are argued to exist cross-linguistically. We trained two annotators, fluent in Chinese, and adjudicated their annotations to form a gold standard. We achieved an inter-annotator agreement kappa of .6 and an observed agreement of .7. We found that annotators had the most difficulty with the abstract pronouns, such as """"unspecified"""" and """"event"""", but we posit that further specification and training has the potential to significantly improve these results. We believe that this annotated data will serve to help improve Machine Translation models that translate from Chinese to a non pro-drop language, like English, that requires all subject pronouns to be explicit.
2011
Singular or Plural? Exploiting Parallel Corpora for Chinese Number Prediction
Elizabeth Baran
|
Nianwen Xue
Proceedings of Machine Translation Summit XIII: Papers
Search