Learning to Automatically Post-Edit Dropped Words in MT

Jacob Mundt, Kristen Parton, Kathleen McKeown


Abstract
Automatic post-editors (APEs) can improve adequacy of MT output by detecting and reinserting dropped content words, but the location where these words are inserted is critical. In this paper, we describe a probabilistic approach for learning reinsertion rules for specific languages and MT systems, as well as a method for synthesizing training data from reference translations. We test the insertion logic on MT systems for Chinese to English and Arabic to English. Our adaptive APE is able to insert within 3 words of the best location 73% of the time (32% in the exact location) in Arabic-English MT output, and 67% of the time in Chinese-English output (30% in the exact location), and delivers improved performance on automated adequacy metrics over a previous rule-based approach to insertion. We consider how particular aspects of the insertion problem make it particularly amenable to machine learning solutions.
Anthology ID:
2012.amta-wptp.5
Volume:
Workshop on Post-Editing Technology and Practice
Month:
October 28
Year:
2012
Address:
San Diego, California, USA
Editors:
Sharon O'Brien, Michel Simard, Lucia Specia
Venue:
AMTA
SIG:
Publisher:
Association for Machine Translation in the Americas
Note:
Pages:
Language:
URL:
https://aclanthology.org/2012.amta-wptp.5
DOI:
Bibkey:
Cite (ACL):
Jacob Mundt, Kristen Parton, and Kathleen McKeown. 2012. Learning to Automatically Post-Edit Dropped Words in MT. In Workshop on Post-Editing Technology and Practice, San Diego, California, USA. Association for Machine Translation in the Americas.
Cite (Informal):
Learning to Automatically Post-Edit Dropped Words in MT (Mundt et al., AMTA 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.amta-wptp.5.pdf