Tosho Hirasawa


2024

pdf bib
Pruning Multilingual Large Language Models for Multilingual Inference
Hwichan Kim | Jun Suzuki | Tosho Hirasawa | Mamoru Komachi
Findings of the Association for Computational Linguistics: EMNLP 2024

Multilingual large language models (MLLMs), trained on multilingual balanced data, demonstrate better zero-shot learning performance in non-English languages compared to large language models trained on English-dominant data. However, the disparity in performance between English and non-English languages remains a challenge yet to be fully addressed. This study introduces a promising direction for enhancing non-English performance through a specialized pruning approach. Specifically, we prune MLLMs using bilingual sentence pairs from English and other languages and empirically demonstrate that this pruning strategy can enhance the MLLMs’ performance in non-English language.

pdf bib
Toward Structured Related Work Generation with Novelty Statements
Kazuya Nishimura | Kuniaki Saito | Tosho Hirasawa | Yoshitaka Ushiku
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)

To help readers understand the novelty and the research context, an excellent related work section is structured (i.e., the section consists of paragraphs determined by categorizing papers into several topics) and includes descriptions of novelty. However, previous studies viewed related work generation as multi-document summarization, and the structure and novelty statement are ignored in such studies. In this paper, we redefine the related work generation task as summarization with structure (i.e., multiple paragraphs with citation) and novelty statement. For this task, we propose a quality-oriented dataset and evaluation metrics. Experiments evaluated the state-of-the-art language models on our tasks, and we confirmed the issues with the current models and the validity of the evaluation indicators.

pdf bib
OSX at Context24: How Well Can GPT Tackle Contexualizing Scientific Figures and Tables
Tosho Hirasawa
Proceedings of the Fourth Workshop on Scholarly Document Processing (SDP 2024)

Identifying the alignment between different parts of a scientific paper is fundamental to scholarly document processing.In the Context24 shared task, participants are given a scientific claim and asked to identify (1) key figures or tables that support the claim and (2) methodological details.While employing a supervised approach to train models on task-specific data is a prevailing strategy for both subtasks, such an approach is not feasible for low-resource domains.Therefore, this paper introduces data-free systems supported by Large Language Models.We propose systems based on GPT-4o and GPT-4-turbo for each task.The experimental results reveal the zero-shot capabilities of GPT-4* in both tasks.

pdf bib
TMU-HIT at MLSP 2024: How Well Can GPT-4 Tackle Multilingual Lexical Simplification?
Taisei Enomoto | Hwichan Kim | Tosho Hirasawa | Yoshinari Nagai | Ayako Sato | Kyotaro Nakajima | Mamoru Komachi
Proceedings of the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

Lexical simplification (LS) is a process of replacing complex words with simpler alternatives to help readers understand sentences seamlessly. This process is divided into two primary subtasks: assessing word complexities and replacing high-complexity words with simpler alternatives. Employing task-specific supervised data to train models is a prevalent strategy for addressing these subtasks. However, such approach cannot be employed for low-resource languages. Therefore, this paper introduces a multilingual LS pipeline system that does not rely on supervised data. Specifically, we have developed systems based on GPT-4 for each subtask. Our systems demonstrated top-class performance on both tasks in many languages. The results indicate that GPT-4 can effectively assess lexical complexity and simplify complex words in a multilingual context with high quality.

2023

pdf bib
Does Masked Language Model Pre-training with Artificial Data Improve Low-resource Neural Machine Translation?
Hiroto Tamura | Tosho Hirasawa | Hwichan Kim | Mamoru Komachi
Findings of the Association for Computational Linguistics: EACL 2023

Pre-training masked language models (MLMs) with artificial data has been proven beneficial for several natural language processing tasks such as natural language understanding and summarization; however, it has been less explored for neural machine translation (NMT).A previous study revealed the benefit of transfer learning for NMT in a limited setup, which differs from MLM.In this study, we prepared two kinds of artificial data and compared the translation performance of NMT when pre-trained with MLM.In addition to the random sequences, we created artificial data mimicking token frequency information from the real world. Our results showed that pre-training the models with artificial data by MLM improves translation performance in low-resource situations. Additionally, we found that pre-training on artificial data created considering token frequency information facilitates improved performance.

pdf bib
Simultaneous Domain Adaptation of Tokenization and Machine Translation
Taisei Enomoto | Tosho Hirasawa | Hwichan Kim | Teruaki Oka | Mamoru Komachi
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

pdf bib
Visual Prediction Improves Zero-Shot Cross-Modal Machine Translation
Tosho Hirasawa | Emanuele Bugliarello | Desmond Elliott | Mamoru Komachi
Proceedings of the Eighth Conference on Machine Translation

Multimodal machine translation (MMT) systems have been successfully developed in recent years for a few language pairs. However, training such models usually requires tuples of a source language text, target language text, and images. Obtaining these data involves expensive human annotations, making it difficult to develop models for unseen text-only language pairs. In this work, we propose the task of zero-shot cross-modal machine translation aiming to transfer multimodal knowledge from an existing multimodal parallel corpus into a new translation direction. We also introduce a novel MMT model with a visual prediction network to learn visual features grounded on multimodal parallel data and provide pseudo-features for text-only language pairs. With this training paradigm, our MMT model outperforms its text-only counterpart. In our extensive analyses, we show that (i) the selection of visual features is important, and (ii) training on image-aware translations and being grounded on a similar language pair are mandatory.

2022

pdf bib
Construction of a Quality Estimation Dataset for Automatic Evaluation of Japanese Grammatical Error Correction
Daisuke Suzuki | Yujin Takahashi | Ikumi Yamashita | Taichi Aida | Tosho Hirasawa | Michitaka Nakatsuji | Masato Mita | Mamoru Komachi
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In grammatical error correction (GEC), automatic evaluation is considered as an important factor for research and development of GEC systems. Previous studies on automatic evaluation have shown that quality estimation models built from datasets with manual evaluation can achieve high performance in automatic evaluation of English GEC. However, quality estimation models have not yet been studied in Japanese, because there are no datasets for constructing quality estimation models. In this study, therefore, we created a quality estimation dataset with manual evaluation to build an automatic evaluation model for Japanese GEC. By building a quality estimation model using this dataset and conducting a meta-evaluation, we verified the usefulness of the quality estimation model for Japanese GEC.

2021

pdf bib
Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation
Seiichiro Kondo | Kengo Hotate | Tosho Hirasawa | Masahiro Kaneko | Mamoru Komachi
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Recently, neural machine translation is widely used for its high translation accuracy, but it is also known to show poor performance at long sentence translation. Besides, this tendency appears prominently for low resource languages. We assume that these problems are caused by long sentences being few in the train data. Therefore, we propose a data augmentation method for handling long sentences. Our method is simple; we only use given parallel corpora as train data and generate long sentences by concatenating two sentences. Based on our experiments, we confirm improvements in long sentence translation by proposed data augmentation despite the simplicity. Moreover, the proposed method improves translation quality more when combined with back-translation.

pdf bib
Machine Translation with Pre-specified Target-side Words Using a Semi-autoregressive Model
Seiichiro Kondo | Aomi Koyama | Tomoshige Kiyuna | Tosho Hirasawa | Mamoru Komachi
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

We introduce our TMU Japanese-to-English system, which employs a semi-autoregressive model, to tackle the WAT 2021 restricted translation task. In this task, we translate an input sentence with the constraint that some words, called restricted target vocabularies (RTVs), must be contained in the output sentence. To satisfy this constraint, we use a semi-autoregressive model, namely, RecoverSAT, due to its ability (known as “forced translation”) to insert specified words into the output sentence. When using “forced translation,” the order of inserting RTVs is a critical problem. In this work, we aligned the source sentence and the corresponding RTVs using GIZA++. In our system, we obtain word alignment between a source sentence and the corresponding RTVs and then sort the RTVs in the order of their corresponding words or phrases in the source sentence. Using the model with sorted order RTVs, we succeeded in inserting all the RTVs into output sentences in more than 96% of the test sentences. Moreover, we confirmed that sorting RTVs improved the BLEU score compared with random order RTVs.

2020

pdf bib
English-to-Japanese Diverse Translation by Combining Forward and Backward Outputs
Masahiro Kaneko | Aizhan Imankulova | Tosho Hirasawa | Mamoru Komachi
Proceedings of the Fourth Workshop on Neural Generation and Translation

We introduce our TMU system that is submitted to The 4th Workshop on Neural Generation and Translation (WNGT2020) to English-to-Japanese (En→Ja) track on Simultaneous Translation And Paraphrase for Language Education (STAPLE) shared task. In most cases machine translation systems generate a single output from the input sentence, however, in order to assist language learners in their journey with better and more diverse feedback, it is helpful to create a machine translation system that is able to produce diverse translations of each input sentence. However, creating such systems would require complex modifications in a model to ensure the diversity of outputs. In this paper, we investigated if it is possible to create such systems in a simple way and whether it can produce desired diverse outputs. In particular, we combined the outputs from forward and backward neural translation models (NMT). Our system achieved third place in En→Ja track, despite adopting only a simple approach.

pdf bib
Towards Multimodal Simultaneous Neural Machine Translation
Aizhan Imankulova | Masahiro Kaneko | Tosho Hirasawa | Mamoru Komachi
Proceedings of the Fifth Conference on Machine Translation

Simultaneous translation involves translating a sentence before the speaker’s utterance is completed in order to realize real-time understanding in multiple languages. This task is significantly more challenging than the general full sentence translation because of the shortage of input information during decoding. To alleviate this shortage, we propose multimodal simultaneous neural machine translation (MSNMT), which leverages visual information as an additional modality. Our experiments with the Multi30k dataset showed that MSNMT significantly outperforms its text-only counterpart in more timely translation situations with low latency. Furthermore, we verified the importance of visual information during decoding by performing an adversarial evaluation of MSNMT, where we studied how models behaved with incongruent input modality and analyzed the effect of different word order between source and target languages.

pdf bib
Neural Machine Translation from Historical Japanese to Contemporary Japanese Using Diachronically Domain-Adapted Word Embeddings
Masashi Takaku | Tosho Hirasawa | Mamoru Komachi | Kanako Komiya
Proceedings of the 34th Pacific Asia Conference on Language, Information and Computation

pdf bib
Zero-shot North Korean to English Neural Machine Translation by Character Tokenization and Phoneme Decomposition
Hwichan Kim | Tosho Hirasawa | Mamoru Komachi
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

The primary limitation of North Korean to English translation is the lack of a parallel corpus; therefore, high translation accuracy cannot be achieved. To address this problem, we propose a zero-shot approach using South Korean data, which are remarkably similar to North Korean data. We train a neural machine translation model after tokenizing a South Korean text at the character level and decomposing characters into phonemes. We demonstrate that our method can effectively learn North Korean to English translation and improve the BLEU scores by +1.01 points in comparison with the baseline.

pdf bib
Translation of New Named Entities from English to Chinese
Zizheng Zhang | Tosho Hirasawa | Wei Houjing | Masahiro Kaneko | Mamoru Komachi
Proceedings of the 7th Workshop on Asian Translation

New things are being created and new words are constantly being added to languages worldwide. However, it is not practical to translate them all manually into a new foreign language. When translating from an alphabetic language such as English to Chinese, appropriate Chinese characters must be assigned, which is particularly costly compared to other language pairs. Therefore, we propose a task of generating and evaluating new translations from English to Chinese focusing on named entities. We defined three criteria for human evaluation—fluency, adequacy of pronunciation, and adequacy of meaning—and constructed evaluation data based on these definitions. In addition, we built a baseline system and analyzed the output of the system.

pdf bib
TMU Japanese-English Multimodal Machine Translation System for WAT 2020
Hiroto Tamura | Tosho Hirasawa | Masahiro Kaneko | Mamoru Komachi
Proceedings of the 7th Workshop on Asian Translation

We introduce our TMU system submitted to the Japanese<->English Multimodal Task (constrained) for WAT 2020 (Nakazawa et al., 2020). This task aims to improve translation performance with the help of another modality (images) associated with the input sentences. In a multimodal translation task, the dataset is, by its nature, a low-resource one. Our method used herein augments the data by generating noisy translations and adding noise to existing training images. Subsequently, we pretrain a translation model on the augmented noisy data, and then fine-tune it on the clean data. We also examine the probabilistic dropping of either the textual or visual context vector in the decoder. This aims to regularize the network to make use of both features while training. The experimental results indicate that translation performance can be improved using our method of textual data augmentation with noising on the target side and probabilistic dropping of either context vector.

pdf bib
Korean-to-Japanese Neural Machine Translation System using Hanja Information
Hwichan Kim | Tosho Hirasawa | Mamoru Komachi
Proceedings of the 7th Workshop on Asian Translation

In this paper, we describe our TMU neural machine translation (NMT) system submitted for the Patent task (Korean→Japanese) of the 7th Workshop on Asian Translation (WAT 2020, Nakazawa et al., 2020). We propose a novel method to train a Korean-to-Japanese translation model. Specifically, we focus on the vocabulary overlap of Korean Hanja words and Japanese Kanji words, and propose strategies to leverage Hanja information. Our experiment shows that Hanja information is effective within a specific domain, leading to an improvement in the BLEU scores by +1.09 points compared to the baseline.

2019

pdf bib
Multimodal Machine Translation with Embedding Prediction
Tosho Hirasawa | Hayahide Yamagishi | Yukio Matsumura | Mamoru Komachi
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Multimodal machine translation is an attractive application of neural machine translation (NMT). It helps computers to deeply understand visual objects and their relations with natural languages. However, multimodal NMT systems suffer from a shortage of available training data, resulting in poor performance for translating rare words. In NMT, pretrained word embeddings have been shown to improve NMT of low-resource domains, and a search-based approach is proposed to address the rare word problem. In this study, we effectively combine these two approaches in the context of multimodal NMT and explore how we can take full advantage of pretrained word embeddings to better translate rare words. We report overall performance improvements of 1.24 METEOR and 2.49 BLEU and achieve an improvement of 7.67 F-score for rare word translation.

pdf bib
Debiasing Word Embeddings Improves Multimodal Machine Translation
Tosho Hirasawa | Mamoru Komachi
Proceedings of Machine Translation Summit XVII: Research Track