Proceedings of the 10th Workshop on Asian Translation

Toshiaki Nakazawa, Kazutaka Kinugawa, Hideya Mino, Isao Goto, Raj Dabre, Shohei Higashiyama, Shantipriya Parida, Makoto Morishita, Ondrej Bojar, Akiko Eriguchi, Yusuke Oda, Akiko Eriguchi, Chenhui Chu, Sadao Kurohashi (Editors)


Anthology ID:
2023.wat-1
Month:
September
Year:
2023
Address:
Macau SAR, China
Venue:
WAT
SIG:
Publisher:
Asia-Pacific Association for Machine Translation
URL:
https://aclanthology.org/2023.wat-1
DOI:
Bib Export formats:
BibTeX MODS XML EndNote
PDF:
https://aclanthology.org/2023.wat-1.pdf

pdf bib
Proceedings of the 10th Workshop on Asian Translation
Toshiaki Nakazawa | Kazutaka Kinugawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Makoto Morishita | Ondrej Bojar | Akiko Eriguchi | Yusuke Oda | Akiko Eriguchi | Chenhui Chu | Sadao Kurohashi

pdf bib
Overview of the 10th Workshop on Asian Translation
Toshiaki Nakazawa | Kazutaka Kinugawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Makoto Morishita | Ondřej Bojar | Akiko Eriguchi | Yusuke Oda | Chenhui Chu | Sadao Kurohashi

This paper presents the results of the shared tasks from the 10th workshop on Asian translation (WAT2023). For the WAT2023, 2 teams submitted their translation results for the human evaluation. We also accepted 1 research paper. About 40 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

pdf bib
Mitigating Domain Mismatch in Machine Translation via Paraphrasing
Hyuga Koretaka | Tomoyuki Kajiwara | Atsushi Fujita | Takashi Ninomiya

Quality of machine translation (MT) deteriorates significantly when translating texts having characteristics that differ from the training data, such as content domain. Although previous studies have focused on adapting MT models on a bilingual parallel corpus in the target domain, this approach is not applicable when no parallel data are available for the target domain or when utilizing black-box MT systems. To mitigate problems caused by such domain mismatch without relying on any corpus in the target domain, this study proposes a method to search for better translations by paraphrasing input texts of MT. To obtain better translations even for input texts from unforeknown domains, we generate their multiple paraphrases, translate each, and rerank the resulting translations to select the most likely one. Experimental results on Japanese-to-English translation reveal that the proposed method improves translation quality in terms of BLEU score for input texts from specific domains.

pdf bib
BITS-P at WAT 2023: Improving Indic Language Multimodal Translation by Image Augmentation using Diffusion Models
Amulya Dash | Hrithik Raj Gupta | Yashvardhan Sharma

This paper describes the proposed system for mutlimodal machine translation. We have participated in multimodal translation tasks for English into three Indic languages: Hindi, Bengali, and Malayalam. We leverage the inherent richness of multimodal data to bridge the gap of ambiguity in translation. We fine-tuned the ‘No Language Left Behind’ (NLLB) machine translation model for multimodal translation, further enhancing the model accuracy by image data augmentation using latent diffusion. Our submission achieves the best BLEU score for English-Hindi, English-Bengali, and English-Malayalam language pairs for both Evaluation and Challenge test sets.

pdf bib
OdiaGenAI’s Participation at WAT2023
Sk Shahid | Guneet Singh Kohli | Sambit Sekhar | Debasish Dhal | Adit Sharma | Shubhendra Kushwaha | Shantipriya Parida | Stig-Arne Grönroos | Satya Ranjan Dash

This paper offers an in-depth overview of the team “ODIAGEN’s” translation system submitted to the Workshop on Asian Translation (WAT2023). Our focus lies in the domain of Indic Multimodal tasks, specifically targeting English to Hindi, English to Malayalam, and English to Bengali translations. The system uses a state-of-the-art Transformer-based architecture, specifically the NLLB-200 model, fine-tuned with language-specific Visual Genome Datasets. With this robust system, we were able to manage both text-to-text and multimodal translations, demonstrating versatility in handling different translation modes. Our results showcase strong performance across the board, with particularly promising results in the Hindi and Bengali translation tasks. A noteworthy achievement of our system lies in its stellar performance across all text-to-text translation tasks. In the categories of English to Hindi, English to Bengali, and English to Malayalam translations, our system claimed the top positions for both the evaluation and challenge sets. This system not only advances our understanding of the challenges and nuances of Indic language translation but also opens avenues for future research to enhance translation accuracy and performance.