Towards a better understanding of statistical post-editing

Marion Potet, Laurent Besacier, Hervé Blanchon, Marwen Azouzi


Abstract
We describe several experiments to better understand the usefulness of statistical post-edition (SPE) to improve phrase-based statistical MT (PBMT) systems raw outputs. Whatever the size of the training corpus, we show that SPE systems trained on general domain data offers no breakthrough to our baseline general domain PBMT system. However, using manually post-edited system outputs to train the SPE led to a slight improvement in the translations quality compared with the use of professional reference translations. We also show that SPE is far more effective for domain adaptation, mainly because it recovers a lot of specific terms unknown to our general PBMT system. Finally, we compare two domain adaptation techniques, post-editing a general domain PBMT system vs building a new domain-adapted PBMT system with two different techniques, and show that the latter outperforms the first one. Yet, when the PBMT is a “black box”, SPE trained with post-edited system outputs remains an interesting option for domain adaptation.
Anthology ID:
2012.iwslt-papers.19
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
284–291
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.19
DOI:
Bibkey:
Cite (ACL):
Marion Potet, Laurent Besacier, Hervé Blanchon, and Marwen Azouzi. 2012. Towards a better understanding of statistical post-editing. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 284–291, Hong Kong, Table of contents.
Cite (Informal):
Towards a better understanding of statistical post-editing (Potet et al., IWSLT 2012)
Copy Citation:
PDF:
https://aclanthology.org/2012.iwslt-papers.19.pdf