Thot: a Toolkit To Train Phrase-based Statistical Translation Models

Daniel Ortiz-Martínez, Ismael García-Varea, Francisco Casacuberta


Abstract
In this paper, we present the Thot toolkit, a set of tools to train phrase-based models for statistical machine translation, which is publicly available as open source software. The toolkit obtains phrase-based models from word-based alignment models; to our knowledge, this functionality has not been offered by any publicly available toolkit. The Thot toolkit also implements a new way for estimating phrase models, this allows to obtain more complete phrase models than the methods described in the literature, including a segmentation length submodel. The toolkit output can be given in different formats in order to be used by other statistical machine translation tools like Pharaoh, which is a beam search decoder for phrase-based alignment models which was used in order to perform translation experiments with the generated models. Additionally, the Thot toolkit can be used to obtain the best alignment between a sentence pair at phrase level.
Anthology ID:
2005.mtsummit-papers.19
Volume:
Proceedings of Machine Translation Summit X: Papers
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
141–148
Language:
URL:
https://aclanthology.org/2005.mtsummit-papers.19
DOI:
Bibkey:
Cite (ACL):
Daniel Ortiz-Martínez, Ismael García-Varea, and Francisco Casacuberta. 2005. Thot: a Toolkit To Train Phrase-based Statistical Translation Models. In Proceedings of Machine Translation Summit X: Papers, pages 141–148, Phuket, Thailand.
Cite (Informal):
Thot: a Toolkit To Train Phrase-based Statistical Translation Models (Ortiz-Martínez et al., MTSummit 2005)
Copy Citation:
PDF:
https://aclanthology.org/2005.mtsummit-papers.19.pdf