A Comparison of Different Punctuation Prediction Approaches in a Translation Context

Vincent Vandeghinste, Lyan Verwimp, Joris Pelemans, Patrick Wambacq


Abstract
We test a series of techniques to predict punctuation and its effect on machine translation (MT) quality. Several techniques for punctuation prediction are compared: language modeling techniques, such as n-grams and long shortterm memories (LSTM), sequence labeling LSTMs (unidirectional and bidirectional), and monolingual phrase-based, hierarchical and neural MT. For actual translation, phrase-based, hierarchical and neural MT are investigated. We observe that for punctuation prediction, phrase-based statistical MT and neural MT reach similar results, and are best used as a preprocessing step which is followed by neural MT to perform the actual translation. Implicit punctuation insertion by a dedicated neural MT system, trained on unpunctuated source and punctuated target, yields similar results.
Anthology ID:
2018.eamt-main.27
Volume:
Proceedings of the 21st Annual Conference of the European Association for Machine Translation
Month:
May
Year:
2018
Address:
Alicante, Spain
Editors:
Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Miquel Esplà-Gomis, Maja Popović, Celia Rico, André Martins, Joachim Van den Bogaert, Mikel L. Forcada
Venue:
EAMT
SIG:
Publisher:
Note:
Pages:
289–298
Language:
URL:
https://aclanthology.org/2018.eamt-main.27
DOI:
Bibkey:
Cite (ACL):
Vincent Vandeghinste, Lyan Verwimp, Joris Pelemans, and Patrick Wambacq. 2018. A Comparison of Different Punctuation Prediction Approaches in a Translation Context. In Proceedings of the 21st Annual Conference of the European Association for Machine Translation, pages 289–298, Alicante, Spain.
Cite (Informal):
A Comparison of Different Punctuation Prediction Approaches in a Translation Context (Vandeghinste et al., EAMT 2018)
Copy Citation:
PDF:
https://aclanthology.org/2018.eamt-main.27.pdf