Evaluation Dataset for Zero Pronoun in Japanese to English Translation

Sho Shimazu, Sho Takase, Toshiaki Nakazawa, Naoaki Okazaki


Abstract
In natural language, we often omit some words that are easily understandable from the context. In particular, pronouns of subject, object, and possessive cases are often omitted in Japanese; these are known as zero pronouns. In translation from Japanese to other languages, we need to find a correct antecedent for each zero pronoun to generate a correct and coherent translation. However, it is difficult for conventional automatic evaluation metrics (e.g., BLEU) to focus on the success of zero pronoun resolution. Therefore, we present a hand-crafted dataset to evaluate whether translation models can resolve the zero pronoun problems in Japanese to English translations. We manually and statistically validate that our dataset can effectively evaluate the correctness of the antecedents selected in translations. Through the translation experiments using our dataset, we reveal shortcomings of an existing context-aware neural machine translation model.
Anthology ID:
2020.lrec-1.447
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
3630–3634
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.447
DOI:
Bibkey:
Cite (ACL):
Sho Shimazu, Sho Takase, Toshiaki Nakazawa, and Naoaki Okazaki. 2020. Evaluation Dataset for Zero Pronoun in Japanese to English Translation. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 3630–3634, Marseille, France. European Language Resources Association.
Cite (Informal):
Evaluation Dataset for Zero Pronoun in Japanese to English Translation (Shimazu et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.447.pdf