Chinese Character Decomposition for Neural MT with Multi-Word Expressions

Lifeng Han, Gareth Jones, Alan Smeaton, Paolo Bolzoni


Abstract
Chinese character decomposition has been used as a feature to enhance Machine Translation (MT) models, combining radicals into character and word level models. Recent work has investigated ideograph or stroke level embedding. However, questions remain about different decomposition levels of Chinese character representations, radical and strokes, best suited for MT. To investigate the impact of Chinese decomposition embedding in detail, i.e., radical, stroke, and intermediate levels, and how well these decompositions represent the meaning of the original character sequences, we carry out analysis with both automated and human evaluation of MT. Furthermore, we investigate if the combination of decomposed Multiword Expressions (MWEs) can enhance the model learning. MWE integration into MT has seen more than a decade of exploration. However, decomposed MWEs has not previously been explored.
Anthology ID:
2021.nodalida-main.35
Volume:
Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May 31--2 June
Year:
2021
Address:
Reykjavik, Iceland (Online)
Editors:
Simon Dobnik, Lilja Øvrelid
Venue:
NoDaLiDa
SIG:
Publisher:
Linköping University Electronic Press, Sweden
Note:
Pages:
336–344
Language:
URL:
https://aclanthology.org/2021.nodalida-main.35
DOI:
Bibkey:
Cite (ACL):
Lifeng Han, Gareth Jones, Alan Smeaton, and Paolo Bolzoni. 2021. Chinese Character Decomposition for Neural MT with Multi-Word Expressions. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 336–344, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Cite (Informal):
Chinese Character Decomposition for Neural MT with Multi-Word Expressions (Han et al., NoDaLiDa 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.nodalida-main.35.pdf
Code
 poethan/MWE4MT