Kuniko Saito
2026
Let’s Put Ourselves in Sally’s Shoes: Shoes-of-Others Prefilling Improves Theory of Mind in Large Language Models
Kazutoshi Shinoda | Nobukatsu Hojo | Kyosuke Nishida | Yoshihiro Yamazaki | Keita Suzuki | Hiroaki Sugiyama | Kuniko Saito
Findings of the Association for Computational Linguistics: EACL 2026
Kazutoshi Shinoda | Nobukatsu Hojo | Kyosuke Nishida | Yoshihiro Yamazaki | Keita Suzuki | Hiroaki Sugiyama | Kuniko Saito
Findings of the Association for Computational Linguistics: EACL 2026
Recent studies have shown that Theory of Mind (ToM) in large language models (LLMs) has not reached human-level performance yet. Since fine-tuning LLMs on ToM datasets often degrades their generalization, several inference-time methods have been proposed to enhance ToM in LLMs. However, existing inference-time methods for ToM are specialized for inferring beliefs from contexts involving changes in the world state. In this study, we present a new inference-time method for ToM, Shoes-of-Others (SoO) prefilling, which makes fewer assumptions about contexts and is applicable to broader scenarios. SoO prefilling simply specifies the beginning of LLM outputs with “Let’s put ourselves in A’s shoes.”, where A denotes the target character’s name. We evaluate SoO prefilling on two benchmarks that assess ToM in conversational and narrative contexts without changes in the world state and find that it consistently improves ToM across five categories of mental states. Our analysis suggests that SoO prefilling elicits faithful thoughts, thereby improving the ToM performance.
2024
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes
Kosuke Nishida | Kyosuke Nishida | Kuniko Saito
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Kosuke Nishida | Kyosuke Nishida | Kuniko Saito
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Loss spikes, a phenomenon in which the loss value diverges suddenly, is a fundamental issue in the pre-training of large language models. This paper supposes that the non-uniformity of the norm of the parameters is one of the causes of loss spikes. Here, in training of neural networks, the scale of the gradients is required to be kept constant throughout the layers to avoid the vanishing and exploding gradients problem. However, to meet these requirements in the Transformer model, the norm of the model parameters must be non-uniform, and thus, parameters whose norm is smaller are more sensitive to the parameter update. To address this issue, we propose a novel technique, weight scaling as reparameterization (WeSaR). WeSaR introduces a gate parameter per parameter matrix and adjusts it to the value satisfying the requirements. Because of the gate parameter, WeSaR sets the norm of the original parameters uniformly, which results in stable training. Experimental results with the Transformer decoders consisting of 130 million, 1.3 billion, and 13 billion parameters showed that WeSaR stabilizes and accelerates training and that it outperformed compared methods including popular initialization methods.
2023
DueT: Image-Text Contrastive Transfer Learning with Dual-adapter Tuning
Taku Hasegawa | Kyosuke Nishida | Koki Maeda | Kuniko Saito
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Taku Hasegawa | Kyosuke Nishida | Koki Maeda | Kuniko Saito
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
This paper presents DueT, a novel transfer learning method for vision and language models built by contrastive learning. In DueT, adapters are inserted into the image and text encoders, which have been initialized using models pre-trained on uni-modal corpora and then frozen. By training only these adapters, DueT enables efficient learning with a reduced number of trainable parameters. Moreover, unlike traditional adapters, those in DueT are equipped with a gating mechanism, enabling effective transfer and connection of knowledge acquired from pre-trained uni-modal encoders while preventing catastrophic forgetting. We report that DueT outperformed simple fine-tuning, the conventional method fixing only the image encoder and training only the text encoder, and the LoRA-based adapter method in accuracy and parameter efficiency for 0-shot image and text retrieval in both English and Japanese domains.
2022
Combining Argumentation Structure and Language Model for Generating Natural Argumentative Dialogue
Koh Mitsuda | Ryuichiro Higashinaka | Kuniko Saito
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Koh Mitsuda | Ryuichiro Higashinaka | Kuniko Saito
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
Argumentative dialogue is an important process where speakers discuss a specific theme for consensus building or decision making. In previous studies for generating consistent argumentative dialogue, retrieval-based methods with hand-crafted argumentation structures have been used. In this study, we propose a method to generate natural argumentative dialogues by combining an argumentation structure and language model. We trained the language model to rewrite a proposition of an argumentation structure on the basis of its information, such as keywords and stance, into the next utterance while considering its context, and we used the model to rewrite propositions in the argumentation structure. We manually evaluated the generated dialogues and found that the proposed method significantly improved the naturalness of dialogues without losing consistency of argumentation.
2017
Automatically Extracting Variant-Normalization Pairs for Japanese Text Normalization
Itsumi Saito | Kyosuke Nishida | Kugatsu Sadamitsu | Kuniko Saito | Junji Tomita
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Itsumi Saito | Kyosuke Nishida | Kugatsu Sadamitsu | Kuniko Saito | Junji Tomita
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Social media texts, such as tweets from Twitter, contain many types of non-standard tokens, and the number of normalization approaches for handling such noisy text has been increasing. We present a method for automatically extracting pairs of a variant word and its normal form from unsegmented text on the basis of a pair-wise similarity approach. We incorporated the acquired variant-normalization pairs into Japanese morphological analysis. The experimental results show that our method can extract widely covered variants from large Twitter data and improve the recall of normalization without degrading the overall accuracy of Japanese morphological analysis.
2012
Creating an Extended Named Entity Dictionary from Wikipedia
Ryuichiro Higashinaka | Kugatsu Sadamitsu | Kuniko Saito | Toshiro Makino | Yoshihiro Matsuo
Proceedings of COLING 2012
Ryuichiro Higashinaka | Kugatsu Sadamitsu | Kuniko Saito | Toshiro Makino | Yoshihiro Matsuo
Proceedings of COLING 2012
Constructing a Class-Based Lexical Dictionary using Interactive Topic Models
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Yoshihiro Matsuo
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Yoshihiro Matsuo
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper proposes a new method of constructing arbitrary class-based related word dictionaries on interactive topic models; we assume that each class is described by a topic. We propose a new semi-supervised method that uses the simplest topic model yielded by the standard EM algorithm; model calculation is very rapid. Furthermore our approach allows a dictionary to be modified interactively and the final dictionary has a hierarchical structure. This paper makes three contributions. First, it proposes a word-based semi-supervised topic model. Second, we apply the semi-supervised topic model to interactive learning; this approach is called the Interactive Topic Model. Third, we propose a score function; it extracts the related words that occupy the middle layer of the hierarchical structure. Experiments show that our method can appropriately retrieve the words belonging to an arbitrary class.
Grammar Error Correction Using Pseudo-Error Sentences and Domain Adaptation
Kenji Imamura | Kuniko Saito | Kugatsu Sadamitsu | Hitoshi Nishikawa
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Kenji Imamura | Kuniko Saito | Kugatsu Sadamitsu | Hitoshi Nishikawa
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Entity Set Expansion using Interactive Topic Information
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Yoshihiro Matsuo
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Yoshihiro Matsuo
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation
2011
Entity Set Expansion using Topic information
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Genichiro Kikui
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Kugatsu Sadamitsu | Kuniko Saito | Kenji Imamura | Genichiro Kikui
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
2009
Discriminative Approach to Predicate-Argument Structure Analysis with Zero-Anaphora Resolution
Kenji Imamura | Kuniko Saito | Tomoko Izumi
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Kenji Imamura | Kuniko Saito | Tomoko Izumi
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Tag Confidence Measure for Semi-Automatically Updating Named Entity Recognition
Kuniko Saito | Kenji Imamura
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
Kuniko Saito | Kenji Imamura
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
2006
A Clustered Global Phrase Reordering Model for Statistical Machine Translation
Masaaki Nagata | Kuniko Saito | Kazuhide Yamamoto | Kazuteru Ohashi
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
Masaaki Nagata | Kuniko Saito | Kazuhide Yamamoto | Kazuteru Ohashi
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
2005
NUT-NTT Statistical Machine Translation System for IWSLT 2005
Kazuteru Ohashi | Kazuhide Yamamoto | Kuniko Saito | Masaaki Nagata
Proceedings of the Second International Workshop on Spoken Language Translation
Kazuteru Ohashi | Kazuhide Yamamoto | Kuniko Saito | Masaaki Nagata
Proceedings of the Second International Workshop on Spoken Language Translation
Portable Translator Capable of Recognizing Characters on Signboard and Menu Captured by its Built-in Camera
Hideharu Nakajima | Yoshihiro Matsuo | Masaaki Nagata | Kuniko Saito
Proceedings of the ACL Interactive Poster and Demonstration Sessions
Hideharu Nakajima | Yoshihiro Matsuo | Masaaki Nagata | Kuniko Saito
Proceedings of the ACL Interactive Poster and Demonstration Sessions
2003
Search
Fix author
Co-authors
- Kenji Imamura 6
- Kugatsu Sadamitsu 6
- Yoshihiro Matsuo 4
- Masaaki Nagata 4
- Kyosuke Nishida 4
- Ryuichiro Higashinaka 2
- Kazuteru Ohashi 2
- Kazuhide Yamamoto 2
- Taku Hasegawa 1
- Nobukatsu Hojo 1
- Tomoko Izumi 1
- Genichiro Kikui 1
- Koki Maeda 1
- Toshiro Makino 1
- Koh Mitsuda 1
- Hideharu Nakajima 1
- Kosuke Nishida 1
- Hitoshi Nishikawa 1
- Itsumi Saito 1
- Kazutoshi Shinoda 1
- Hiroaki Sugiyama 1
- Keita Suzuki 1
- Junji Tomita 1
- Yoshihiro Yamazaki 1