Automatic Bilingual Phrase Dictionary Construction from GIZA++ Output

Albina Khusainova, Vitaly Romanov, Adil Khan


Abstract
Modern encoder-decoder based neural machine translation (NMT) models are normally trained on parallel sentences. Hence, they give best results when translating full sentences rather than sentence parts. Thereby, the task of translating commonly used phrases, which often arises for language learners, is not addressed by NMT models. While for high-resourced language pairs human-built phrase dictionaries exist, less-resourced pairs do not have them. We suggest an approach for building such dictionary automatically based on the GIZA++ output and show that it works significantly better than translating phrases with a sentences-trained NMT system.
Anthology ID:
2022.mwe-1.12
Volume:
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Month:
June
Year:
2022
Address:
Marseille, France
Editors:
Archna Bhatia, Paul Cook, Shiva Taslimipoor, Marcos Garcia, Carlos Ramisch
Venue:
MWE
SIG:
SIGLEX
Publisher:
European Language Resources Association
Note:
Pages:
81–88
Language:
URL:
https://aclanthology.org/2022.mwe-1.12
DOI:
Bibkey:
Cite (ACL):
Albina Khusainova, Vitaly Romanov, and Adil Khan. 2022. Automatic Bilingual Phrase Dictionary Construction from GIZA++ Output. In Proceedings of the 18th Workshop on Multiword Expressions @LREC2022, pages 81–88, Marseille, France. European Language Resources Association.
Cite (Informal):
Automatic Bilingual Phrase Dictionary Construction from GIZA++ Output (Khusainova et al., MWE 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.mwe-1.12.pdf