Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels

Alberto Poncelas, Ohnmar Htun


Abstract
In Neural Machine Translation (NMT) systems, there is generally little control over the lexicon of the output. Consequently, the translated output may be too difficult for certain audiences. For example, for people with limited knowledge of the language, vocabulary is a major impediment to understanding a text. In this work, we build a complexity-controllable NMT for English-to-Japanese translations. More particularly, we aim to modulate the difficulty of the translation in terms of not only the vocabulary but also the use of kanji. For achieving this, we follow a sentence-tagging approach to influence the output. Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels.
Anthology ID:
2022.tsar-1.7
Volume:
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Virtual)
Editors:
Sanja Štajner, Horacio Saggion, Daniel Ferrés, Matthew Shardlow, Kim Cheng Sheang, Kai North, Marcos Zampieri, Wei Xu
Venue:
TSAR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
77–85
Language:
URL:
https://aclanthology.org/2022.tsar-1.7
DOI:
10.18653/v1/2022.tsar-1.7
Bibkey:
Cite (ACL):
Alberto Poncelas and Ohnmar Htun. 2022. Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels. In Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), pages 77–85, Abu Dhabi, United Arab Emirates (Virtual). Association for Computational Linguistics.
Cite (Informal):
Controlling Japanese Machine Translation Output by Using JLPT Vocabulary Levels (Poncelas & Htun, TSAR 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.tsar-1.7.pdf
Video:
 https://aclanthology.org/2022.tsar-1.7.mp4