Kuniko Saito

2026

Let’s Put Ourselves in Sally’s Shoes: Shoes-of-Others Prefilling Improves Theory of Mind in Large Language Models
Kazutoshi Shinoda | Nobukatsu Hojo | Kyosuke Nishida | Yoshihiro Yamazaki | Keita Suzuki | Hiroaki Sugiyama | Kuniko Saito
Findings of the Association for Computational Linguistics: EACL 2026

Recent studies have shown that Theory of Mind (ToM) in large language models (LLMs) has not reached human-level performance yet. Since fine-tuning LLMs on ToM datasets often degrades their generalization, several inference-time methods have been proposed to enhance ToM in LLMs. However, existing inference-time methods for ToM are specialized for inferring beliefs from contexts involving changes in the world state. In this study, we present a new inference-time method for ToM, Shoes-of-Others (SoO) prefilling, which makes fewer assumptions about contexts and is applicable to broader scenarios. SoO prefilling simply specifies the beginning of LLM outputs with “Let’s put ourselves in A’s shoes.”, where A denotes the target character’s name. We evaluate SoO prefilling on two benchmarks that assess ToM in conversational and narrative contexts without changes in the world state and find that it consistently improves ToM across five categories of mental states. Our analysis suggests that SoO prefilling elicits faithful thoughts, thereby improving the ToM performance.

2024

pdf bib abs

Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes
Kosuke Nishida | Kyosuke Nishida | Kuniko Saito
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Loss spikes, a phenomenon in which the loss value diverges suddenly, is a fundamental issue in the pre-training of large language models. This paper supposes that the non-uniformity of the norm of the parameters is one of the causes of loss spikes. Here, in training of neural networks, the scale of the gradients is required to be kept constant throughout the layers to avoid the vanishing and exploding gradients problem. However, to meet these requirements in the Transformer model, the norm of the model parameters must be non-uniform, and thus, parameters whose norm is smaller are more sensitive to the parameter update. To address this issue, we propose a novel technique, weight scaling as reparameterization (WeSaR). WeSaR introduces a gate parameter per parameter matrix and adjusts it to the value satisfying the requirements. Because of the gate parameter, WeSaR sets the norm of the original parameters uniformly, which results in stable training. Experimental results with the Transformer decoders consisting of 130 million, 1.3 billion, and 13 billion parameters showed that WeSaR stabilizes and accelerates training and that it outperformed compared methods including popular initialization methods.

2023

pdf bib abs

DueT: Image-Text Contrastive Transfer Learning with Dual-adapter Tuning
Taku Hasegawa | Kyosuke Nishida | Koki Maeda | Kuniko Saito
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This paper presents DueT, a novel transfer learning method for vision and language models built by contrastive learning. In DueT, adapters are inserted into the image and text encoders, which have been initialized using models pre-trained on uni-modal corpora and then frozen. By training only these adapters, DueT enables efficient learning with a reduced number of trainable parameters. Moreover, unlike traditional adapters, those in DueT are equipped with a gating mechanism, enabling effective transfer and connection of knowledge acquired from pre-trained uni-modal encoders while preventing catastrophic forgetting. We report that DueT outperformed simple fine-tuning, the conventional method fixing only the image encoder and training only the text encoder, and the LoRA-based adapter method in accuracy and parameter efficiency for 0-shot image and text retrieval in both English and Japanese domains.

2022

pdf bib abs

Combining Argumentation Structure and Language Model for Generating Natural Argumentative Dialogue
Koh Mitsuda | Ryuichiro Higashinaka | Kuniko Saito
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Argumentative dialogue is an important process where speakers discuss a specific theme for consensus building or decision making. In previous studies for generating consistent argumentative dialogue, retrieval-based methods with hand-crafted argumentation structures have been used. In this study, we propose a method to generate natural argumentative dialogues by combining an argumentation structure and language model. We trained the language model to rewrite a proposition of an argumentation structure on the basis of its information, such as keywords and stance, into the next utterance while considering its context, and we used the model to rewrite propositions in the argumentation structure. We manually evaluated the generated dialogues and found that the proposed method significantly improved the naturalness of dialogues without losing consistency of argumentation.

2017

pdf bib abs

Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization
Itsumi Saito | Kyosuke Nishida | Kugatsu Sadamitsu | Kuniko Saito | Junji Tomita
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Social media texts, such as tweets from Twitter, contain many types of non-standard tokens, and the number of normalization approaches for handling such noisy text has been increasing. We present a method for automatically extracting pairs of a variant word and its normal form from unsegmented text on the basis of a pair-wise similarity approach. We incorporated the acquired variant-normalization pairs into Japanese morphological analysis. The experimental results show that our method can extract widely covered variants from large Twitter data and improve the recall of normalization without degrading the overall accuracy of Japanese morphological analysis.

2012

pdf bib

Entity Set Expansion using Interactive Topic Information
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Yoshihiro Matsuo
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

pdf bib

Creating an Extended Named Entity Dictionary from Wikipedia
Ryuichiro Higashinaka | Kugatsu Sadamitsu | Kuniko Saito | Toshiro Makino | Yoshihiro Matsuo
Proceedings of COLING 2012

pdf bib

Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation
Kenji Imamura | Kuniko Saito | Kugatsu Sadamitsu | Hitoshi Nishikawa
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib abs

Constructing a Class-Based Lexical Dictionary using Interactive Topic Models
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Yoshihiro Matsuo
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper proposes a new method of constructing arbitrary class-based related word dictionaries on interactive topic models; we assume that each class is described by a topic. We propose a new semi-supervised method that uses the simplest topic model yielded by the standard EM algorithm; model calculation is very rapid. Furthermore our approach allows a dictionary to be modified interactively and the final dictionary has a hierarchical structure. This paper makes three contributions. First, it proposes a word-based semi-supervised topic model. Second, we apply the semi-supervised topic model to interactive learning; this approach is called the Interactive Topic Model. Third, we propose a score function; it extracts the related words that occupy the middle layer of the hierarchical structure. Experiments show that our method can appropriately retrieve the words belonging to an arbitrary class.