Johanes Effendi


2022

pdf bib
Benefiting from Language Similarity in the Multilingual MT Training: Case Study of Indonesian and Malaysian
Alberto Poncelas | Johanes Effendi
Proceedings of the Fifth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2022)

The development of machine translation (MT) has been successful in breaking the language barrier of the world’s top 10-20 languages. However, for the rest of it, delivering an acceptable translation quality is still a challenge due to the limited resource. To tackle this problem, most studies focus on augmenting data while overlooking the fact that we can borrow high-quality natural data from the closely-related language. In this work, we propose an MT model training strategy by increasing the language directions as a means of augmentation in a multilingual setting. Our experiment result using Indonesian and Malaysian on the state-of-the-art MT model showcases the effectiveness and robustness of our method.

pdf bib
Rakuten’s Participation in WAT 2022: Parallel Dataset Filtering by Leveraging Vocabulary Heterogeneity
Alberto Poncelas | Johanes Effendi | Ohnmar Htun | Sunil Yadav | Dongzhe Wang | Saurabh Jain
Proceedings of the 9th Workshop on Asian Translation

This paper introduces our neural machine translation system’s participation in the WAT 2022 shared translation task (team ID: sakura). We participated in the Parallel Data Filtering Task. Our approach based on Feature Decay Algorithms achieved +1.4 and +2.4 BLEU points for English to Japanese and Japanese to English respectively compared to the model trained on the full dataset, showing the effectiveness of FDA on in-domain data selection.

2018

pdf bib
Multi-paraphrase Augmentation to Leverage Neural Caption Translation
Johanes Effendi | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 15th International Conference on Spoken Language Translation

Paraphrasing has been proven to improve translation quality in machine translation (MT) and has been widely studied alongside with the development of statistical MT (SMT). In this paper, we investigate and utilize neural paraphrasing to improve translation quality in neural MT (NMT), which has not yet been much explored. Our first contribution is to propose a new way of creating a multi-paraphrase corpus through visual description. After that, we also proposed to construct neural paraphrase models which initiate expert models and utilize them to leverage NMT. Here, we diffuse the image information by using image-based paraphrasing without using the image itself. Our proposed image-based multi-paraphrase augmentation strategies showed improvement against a vanilla NMT baseline.