2025
pdf
bib
abs
Understanding the Side Effects of Rank-One Knowledge Editing
Ryosuke Takahashi
|
Go Kamoda
|
Benjamin Heinzerling
|
Keisuke Sakaguchi
|
Kentaro Inui
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
This study conducts a detailed analysis of the side effects of rank-one knowledge editing using language models with controlled knowledge. The analysis focuses on each element of knowledge triples (subject, relation, object) and examines two aspects: “knowledge that causes large side effects when edited” and “knowledge that is affected by the side effects.” Our findings suggest that editing knowledge with subjects that have relationships with numerous objects or are robustly embedded within the LM may trigger extensive side effects. Furthermore, we demonstrate that the similarity between relation vectors, the density of object vectors, and the distortion of knowledge representations are closely related to how susceptible knowledge is to editing influences. The findings of this research provide new insights into the mechanisms of side effects in LM knowledge editing and indicate specific directions for developing more effective and reliable knowledge editing methods.
pdf
bib
abs
Quantifying the Influence of Evaluation Aspects on Long-Form Response Assessment
Go Kamoda
|
Akari Asai
|
Ana Brassard
|
Keisuke Sakaguchi
Proceedings of the 31st International Conference on Computational Linguistics
Evaluating the outputs of large language models (LLMs) on long-form generative tasks remains challenging. While fine-grained, aspect-wise evaluations provide valuable diagnostic information, they are difficult to design exhaustively, and each aspect’s contribution to the overall acceptability of an answer is unclear. In this study, we propose a method to compute an overall quality score as a weighted average of three key aspects: factuality, informative- ness, and formality. This approach achieves stronger correlations with human judgments compared to previous metrics. Our analysis identifies factuality as the most predictive aspect of overall quality. Additionally, we release a dataset of 1.2k long-form QA answers annotated with both absolute judgments and relative preferences in overall and aspect-wise schemes to aid future research in evaluation practices.
pdf
bib
abs
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
Go Kamoda
|
Benjamin Heinzerling
|
Tatsuro Inaba
|
Keito Kudo
|
Keisuke Sakaguchi
|
Kentaro Inui
Findings of the Association for Computational Linguistics: NAACL 2025
According to the stages-of-inference hypothesis, early layers of language models map their subword-tokenized input, which does not necessarily correspond to a linguistically meaningful segmentation, to more meaningful representations that form the model’s “inner vocabulary”.Prior analysis of this *detokenization* stage has predominantly relied on probing and interventions such as path patching, which involve selecting particular inputs, choosing a subset of components that will be patched, and then observing changes in model behavior.Here, we show that several important aspects of the detokenization stage can be understood purely by analyzing model weights, without performing any model inference steps.Specifically, we introduce an analytical decomposition of first-layer attention in GPT-2.Our decomposition yields interpretable terms that quantify the relative contributions of position-related, token-related, and mixed effects.By focusing on terms in this decomposition, we discover weight-based explanations of attention bias toward close tokens and attention for detokenization.
pdf
bib
abs
How a Bilingual LM Becomes Bilingual: Tracing Internal Representations with Sparse Autoencoders
Tatsuro Inaba
|
Go Kamoda
|
Kentaro Inui
|
Masaru Isonuma
|
Yusuke Miyao
|
Yohei Oseki
|
Yu Takagi
|
Benjamin Heinzerling
Findings of the Association for Computational Linguistics: EMNLP 2025
This study explores how bilingual language models develop complex internal representations.We employ sparse autoencoders to analyze internal representations of bilingual language models with a focus on the effects of training steps, layers, and model sizes.Our analysis shows that language models first learn languages separately, and then gradually form bilingual alignments, particularly in the mid layers. We also found that this bilingual tendency is stronger in larger models.Building on these findings, we demonstrate the critical role of bilingual representations in model performance by employing a novel method that integrates decomposed representations from a fully trained model into a mid-training model.Our results provide insights into how language models acquire bilingual capabilities.
2023
pdf
bib
abs
Test-time Augmentation for Factual Probing
Go Kamoda
|
Benjamin Heinzerling
|
Keisuke Sakaguchi
|
Kentaro Inui
Findings of the Association for Computational Linguistics: EMNLP 2023
Factual probing is a method that uses prompts to test if a language model “knows” certain world knowledge facts. A problem in factual probing is that small changes to the prompt can lead to large changes in model output. Previous work aimed to alleviate this problem by optimizing prompts via text mining or fine-tuning. However, such approaches are relation-specific and do not generalize to unseen relation types. Here, we propose to use test-time augmentation (TTA) as a relation-agnostic method for reducing sensitivity to prompt variations by automatically augmenting and ensembling prompts at test time. Experiments show improved model calibration, i.e., with TTA, model confidence better reflects prediction accuracy. Improvements in prediction accuracy are observed for some models, but for other models, TTA leads to degradation. Error analysis identifies the difficulty of producing high-quality prompt variations as the main challenge for TTA.