Yusuke Oda - ACL Anthology

Yusuke Oda

2025

Massive Supervised Fine-tuning Experiments Reveal How Data, Layer, and Training Factors Shape LLM Alignment Quality
Yuto Harada | Yusuke Yamauchi | Yusuke Oda | Yohei Oseki | Yusuke Miyao | Yu Takagi
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Supervised fine-tuning (SFT) is a critical step in aligning large language models (LLMs) with human instructions and values, yet many aspects of SFT remain poorly understood. We trained a wide range of base models on a variety of datasets including code generation, mathematical reasoning, and general-domain tasks, resulting in 1,000+ SFT models under controlled conditions. We then identified the dataset properties that matter most and examined the layer-wise modifications introduced by SFT.Our findings reveal that some training–task synergies persist across all models while others vary substantially, emphasizing the importance of model-specific strategies. Moreover, we demonstrate that perplexity consistently predicts SFT effectiveness, often surpassing superficial similarity between the training data and the benchmark, and that mid-layer weight changes correlate most strongly with performance gains. We release these 1,000+ SFT models and benchmark results to accelerate further research. All resources are available at https://github.com/llm-jp/massive-sft.

Limited low-resource language corpora in professional domains like medicine hinder cross-lingual domain adaptation of pre-trained large language models (PLMs). While abundant English medical corpora could complement this scarcity, the effective mixture of English and target language, including machine-translated content, remains underexplored. We examined how linguistic features (e.g., token sizes and language proportions) affect performance on a Japanese–English medical knowledge benchmark. Through continued pre-training of a bilingual PLM on multilingual corpora with varying proportions of English and Japanese texts (both original and machine-translated), we analyzed correlations between linguistic features and fine-grained task performance. Our findings suggest a practical approach to optimizing multilingual corpora for cross-lingual domain adaptation, which requires leveraging specialized knowledge from English corpora while ensuring sufficient coverage of language-specific expressions in a target language (Japanese). Such insights will contribute to the development of multilingual models that effectively leverage English-language resources in various professional domains with low-resource languages.

Instability in Downstream Task Performance During LLM Pretraining
Yuto Nishida | Masaru Isonuma | Yusuke Oda
Findings of the Association for Computational Linguistics: EMNLP 2025

When training large language models (LLMs), it is common practice to track downstream task performance throughout the training process and select the checkpoint with the highest validation score.However, downstream metrics often exhibit substantial fluctuations, making it difficult to identify the checkpoint that truly represents the best-performing model.In this study, we empirically analyze the stability of downstream task performance in an LLM trained on diverse web-scale corpora.We find that task scores frequently fluctuate throughout training, both at the aggregate and example levels.To address this instability, we investigate two post-hoc checkpoint integration methods: checkpoint averaging and ensemble, motivated by the hypothesis that aggregating neighboring checkpoints can reduce performance volatility.We demonstrate both empirically and theoretically that these methods improve downstream performance stability without requiring any changes to the training procedure.

Developing Japanese CLIP Models Leveraging an Open-weight LLM for Large-scale Dataset Translation
Issa Sugiura | Shuhei Kurita | Yusuke Oda | Daisuke Kawahara | Naoaki Okazaki
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

CLIP is a foundational model that bridges images and text, widely adopted as a key component in numerous vision-language models.However, the lack of large-scale open Japanese image-text pairs poses a significant barrier to the development of Japanese vision-language models.In this study, we constructed a Japanese image-text pair dataset with 1.5 billion examples using machine translation with open-weight LLMs and pre-trained Japanese CLIP models on the dataset.The performance of the pre-trained models was evaluated across seven benchmark datasets, achieving competitive average scores compared to models of similar size without the need for extensive data curation. However, the results also revealed relatively low performance on tasks specific to Japanese culture, highlighting the limitations of translation-based approaches in capturing cultural nuances. Our dataset, models, and code are publicly available.

2023

Proceedings of the 10th Workshop on Asian Translation
Toshiaki Nakazawa | Kazutaka Kinugawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Makoto Morishita | Ondřej Bojar | Akiko Eriguchi | Yusuke Oda | Chenhui Chu | Sadao Kurohashi
Proceedings of the 10th Workshop on Asian Translation

Overview of the 10th Workshop on Asian Translation
Toshiaki Nakazawa | Kazutaka Kinugawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Makoto Morishita | Ondřej Bojar | Akiko Eriguchi | Yusuke Oda | Chenhui Chu | Sadao Kurohashi
Proceedings of the 10th Workshop on Asian Translation

This paper presents the results of the shared tasks from the 10th workshop on Asian translation (WAT2023). For the WAT2023, 2 teams submitted their translation results for the human evaluation. We also accepted 1 research paper. About 40 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2022

Are Prompt-based Models Clueless?
Pride Kavumba | Ryo Takahashi | Yusuke Oda
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Finetuning large pre-trained language models with a task-specific head has advanced the state-of-the-art on many natural language understanding benchmarks. However, models with a task-specific head require a lot of training data, making them susceptible to learning and exploiting dataset-specific superficial cues that do not generalize to other datasets. Prompting has reduced the data requirement by reusing the language model head and formatting the task input to match the pre-training objective. Therefore, it is expected that few-shot prompt-based models do not exploit superficial cues. This paper presents an empirical examination of whether few-shot prompt-based models also exploit superficial cues. Analyzing few-shot prompt-based models on MNLI, SNLI, HANS, and COPA has revealed that prompt-based models also exploit superficial cues. While the models perform well on instances with superficial cues, they often underperform or only marginally outperform random accuracy on instances without superficial cues.

Overview of the 9th Workshop on Asian Translation
Toshiaki Nakazawa | Hideya Mino | Isao Goto | Raj Dabre | Shohei Higashiyama | Shantipriya Parida | Anoop Kunchukuttan | Makoto Morishita | Ondřej Bojar | Chenhui Chu | Akiko Eriguchi | Kaori Abe | Yusuke Oda | Sadao Kurohashi
Proceedings of the 9th Workshop on Asian Translation

This paper presents the results of the shared tasks from the 9th workshop on Asian translation (WAT2022). For the WAT2022, 8 teams submitted their translation results for the human evaluation. We also accepted 4 research papers. About 300 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2021

This paper presents the results of the shared tasks from the 8th workshop on Asian translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 teams submitted their translation results for the human evaluation. We also accepted 5 research papers. About 2,100 translation results were submitted to the automatic evaluation server, and selected submissions were manually evaluated.

2020

TDDC: Timely Disclosure Documents Corpus
Nobushige Doi | Yusuke Oda | Toshiaki Nakazawa
Proceedings of the Twelfth Language Resources and Evaluation Conference

In this paper, we describe the details of the Timely Disclosure Documents Corpus (TDDC). TDDC was prepared by manually aligning the sentences from past Japanese and English timely disclosure documents in PDF format published by companies listed on the Tokyo Stock Exchange. TDDC consists of approximately 1.4 million parallel sentences in Japanese and English. TDDC was used as the official dataset for the 6th Workshop on Asian Translation to encourage the development of machine translation.

Proceedings of the Fourth Workshop on Neural Generation and Translation
Alexandra Birch | Andrew Finch | Hiroaki Hayashi | Kenneth Heafield | Marcin Junczys-Dowmunt | Ioannis Konstas | Xian Li | Graham Neubig | Yusuke Oda
Proceedings of the Fourth Workshop on Neural Generation and Translation

Findings of the Fourth Workshop on Neural Generation and Translation
Kenneth Heafield | Hiroaki Hayashi | Yusuke Oda | Ioannis Konstas | Andrew Finch | Graham Neubig | Xian Li | Alexandra Birch
Proceedings of the Fourth Workshop on Neural Generation and Translation

We describe the finding of the Fourth Workshop on Neural Generation and Translation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2020). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the three shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document-level generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language and 3) STAPLE task: creation of as many possible translations of a given input text. This last shared task was organised by Duolingo.

2019

Proceedings of the 6th Workshop on Asian Translation
Toshiaki Nakazawa | Chenchen Ding | Raj Dabre | Anoop Kunchukuttan | Nobushige Doi | Yusuke Oda | Ondřej Bojar | Shantipriya Parida | Isao Goto | Hidaya Mino
Proceedings of the 6th Workshop on Asian Translation

Overview of the 6th Workshop on Asian Translation
Toshiaki Nakazawa | Nobushige Doi | Shohei Higashiyama | Chenchen Ding | Raj Dabre | Hideya Mino | Isao Goto | Win Pa Pa | Anoop Kunchukuttan | Yusuke Oda | Shantipriya Parida | Ondřej Bojar | Sadao Kurohashi
Proceedings of the 6th Workshop on Asian Translation

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task. For the WAT2019, 25 teams participated in the shared tasks. We also received 10 research paper submissions out of which 61 were accepted. About 400 translation results were submitted to the automatic evaluation server, and selected submis- sions were manually evaluated.

Proceedings of the 3rd Workshop on Neural Generation and Translation
Alexandra Birch | Andrew Finch | Hiroaki Hayashi | Ioannis Konstas | Thang Luong | Graham Neubig | Yusuke Oda | Katsuhito Sudoh
Proceedings of the 3rd Workshop on Neural Generation and Translation

Findings of the Third Workshop on Neural Generation and Translation
Hiroaki Hayashi | Yusuke Oda | Alexandra Birch | Ioannis Konstas | Andrew Finch | Minh-Thang Luong | Graham Neubig | Katsuhito Sudoh
Proceedings of the 3rd Workshop on Neural Generation and Translation

This document describes the findings of the Third Workshop on Neural Generation and Translation, held in concert with the annual conference of the Empirical Methods in Natural Language Processing (EMNLP 2019). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the two shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language.

2018

Proceedings of the 2nd Workshop on Neural Machine Translation and Generation
Alexandra Birch | Andrew Finch | Thang Luong | Graham Neubig | Yusuke Oda
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Findings of the Second Workshop on Neural Machine Translation and Generation
Alexandra Birch | Andrew Finch | Minh-Thang Luong | Graham Neubig | Yusuke Oda
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

This document describes the findings of the Second Workshop on Neural Machine Translation and Generation, held in concert with the annual conference of the Association for Computational Linguistics (ACL 2018). First, we summarize the research trends of papers presented in the proceedings, and note that there is particular interest in linguistic structure, domain adaptation, data augmentation, handling inadequate resources, and analysis of models. Second, we describe the results of the workshop’s shared task on efficient neural machine translation, where participants were tasked with creating MT systems that are both accurate and efficient.

2017

Neural Machine Translation via Binary Code Prediction
Yusuke Oda | Philip Arthur | Graham Neubig | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we propose a new method for calculating the output layer in neural machine translation systems. The method is based on predicting a binary code for each word and can reduce computation time/memory requirements of the output layer to be logarithmic in vocabulary size in the best case. In addition, we also introduce two advanced approaches to improve the robustness of the proposed model: using error-correcting codes and combining softmax and binary codes. Experiments on two English-Japanese bidirectional translation tasks show proposed models achieve BLEU scores that approach the softmax, while reducing memory usage to the order of less than 1/10 and improving decoding speed on CPUs by x5 to x10.

An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation
Makoto Morishita | Yusuke Oda | Graham Neubig | Koichiro Yoshino | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the First Workshop on Neural Machine Translation

Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.

Overview of the 4th Workshop on Asian Translation
Toshiaki Nakazawa | Shohei Higashiyama | Chenchen Ding | Hideya Mino | Isao Goto | Hideto Kazawa | Yusuke Oda | Graham Neubig | Sadao Kurohashi
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper presents the results of the shared tasks from the 4th workshop on Asian translation (WAT2017) including J↔E, J↔C scientific paper translation subtasks, C↔J, K↔J, E↔J patent translation subtasks, H↔E mixed domain subtasks, J↔E newswire subtasks and J↔E recipe subtasks. For the WAT2017, 12 institutions participated in the shared tasks. About 300 translation results have been submitted to the automatic evaluation server, and selected submissions were manually evaluated.

A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task
Yusuke Oda | Katsuhito Sudoh | Satoshi Nakamura | Masao Utiyama | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task. The system consists of a language-independent tokenizer and an attentional encoder-decoder style neural machine translation model. According to the official results, our system achieves higher translation accuracy than any systems submitted previous campaigns despite simple model architecture.

2016

Phrase-based Machine Translation using Multiple Preordering Candidates
Yusuke Oda | Taku Kudo | Tetsuji Nakagawa | Taro Watanabe
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we propose a new decoding method for phrase-based statistical machine translation which directly uses multiple preordering candidates as a graph structure. Compared with previous phrase-based decoding methods, our method is based on a simple left-to-right dynamic programming in which no decoding-time reordering is performed. As a result, its runtime is very fast and implementing the algorithm becomes easy. Our system does not depend on specific preordering methods as long as they output multiple preordering candidates, and it is trivial to employ existing preordering methods into our system. In our experiments for translating diverse 11 languages into English, the proposed method outperforms conventional phrase-based decoder in terms of translation qualities under comparable or faster decoding time.

2015

Ckylark: A More Robust PCFG-LA Parser
Yusuke Oda | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations

Syntax-based Simultaneous Translation through Prediction of Unseen Syntactic Constituents
Yusuke Oda | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

The NAIST-NTT TED talk treebank
Graham Neubig | Katsuhiro Sudoh | Yusuke Oda | Kevin Duh | Hajime Tsukuda | Masaaki Nagata
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers

Syntactic parsing is a fundamental natural language processing technology that has proven useful in machine translation, language modeling, sentence segmentation, and a number of other applications related to speech translation. However, there is a paucity of manually annotated syntactic parsing resources for speech, and particularly for the lecture speech that is the current target of the IWSLT translation campaign. In this work, we present a new manually annotated treebank of TED talks that we hope will prove useful for investigation into the interaction between syntax and these speechrelated applications. The first version of the corpus includes 1,217 sentences and 23,158 words manually annotated with parse trees, and aligned with translations in 26-43 different languages. In this paper we describe the collection of the corpus, and an analysis of its various characteristics.

Optimizing Segmentation Strategies for Simultaneous Speech Translation
Yusuke Oda | Graham Neubig | Sakriani Sakti | Tomoki Toda | Satoshi Nakamura
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Co-authors

Shohei Higashiyama 7

Shantipriya Parida 7

Alexandra Birch 6

Satoshi Nakamura 6

Chenchen Ding 5

Akiko Eriguchi 5

Anoop Kunchukuttan 5

Katsuhito Sudoh 5

Hiroaki Hayashi 4

Ioannis Konstas 4

Minh-Thang Luong 4

Makoto Morishita 4

Nobushige Doi 3

Sakriani Sakti 3

Kenneth Heafield 2

Daisuke Kawahara 2

Kazutaka Kinugawa 2

Hideki Nakayama 2

Koichiro Yoshino 2

Philip Arthur 1

Pushpak Bhattacharyya 1

Masaru Isonuma 1

Junfeng Jiang 1

Marcin Junczys-Dowmunt 1

Pride Kavumba 1

Hideto Kazawa 1

Kazuma Kobayashi 1

Shuhei Kurita 1

Hiroshi Manabe 1

Masaaki Nagata 1

Tetsuji Nakagawa 1

Naoaki Okazaki 1

Katsuhiro Sudoh 1

Eiichiro Sumita 1

Ryo Takahashi 1

Hajime Tsukuda 1

Masao Utiyama 1

Taro Watanabe 1

Yusuke Yamauchi 1

Venues