Makoto Yamada

2025

In-context learning (ICL) has emerged as a capability of large language models (LLMs), enabling them to adapt to new tasks using provided examples. While ICL has demonstrated its strong effectiveness, there is limited understanding of its vulnerability against potential threats. This paper examines ICL’s vulnerability to data poisoning attacks. We introduce ICLPoison, an attacking method specially designed to exploit ICL’s unique learning mechanisms by identifying discrete text perturbations that influence LLM hidden states. We propose three representative attack strategies, evaluated across various models and tasks. Our experiments, including those on GPT-4, show that ICL performance can be significantly compromised by these attacks, highlighting the urgent need for improved defense mechanisms to protect LLMs’ integrity and reliability.

pdf bib abs

When LRP Diverges from Leave-One-Out in Transformers
Weiqiu You | Siqi Zeng | Yao-Hung Hubert Tsai | Makoto Yamada | Han Zhao
Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP

Leave-One-Out (LOO) provides an intuitive measure of feature importance but is computationally prohibitive. While Layer-Wise Relevance Propagation (LRP) offers a potentially efficient alternative, its axiomatic soundness in modern Transformers remains under-examined. In this work, we first show that the bilinear propagation rules used in recent advances of AttnLRP violate implementation invariance. We prove this analytically and confirm it empirically in linear attention layers. Second, we also revisit CP-LRP as a diagnostic baseline and find that bypassing relevance propagation through the softmax layer—back-propagating relevance only through the value matrices—significantly improves alignment with LOO, particularly in the middle-to-late Transformer layers. Overall, our results suggest that (i) bilinear factorization sensitivity and (ii) softmax propagation error potentially jointly undermine LRP’s ability to approximate LOO in Transformers.

2024

pdf bib abs

Large language models (LLMs) are susceptible to a type of attack known as jailbreaking, which misleads LLMs to output harmful contents. Although there are diverse jailbreak attack strategies, there is no unified understanding on why some methods succeed and others fail. This paper explores the behavior of harmful and harmless prompts in the LLM’s representation space to investigate the intrinsic properties of successful jailbreak attacks. We hypothesize that successful attacks share some similar properties: They are effective in moving the representation of the harmful prompt towards the direction to the harmless prompts. We leverage hidden representations into the objective of existing jailbreak attacks to move the attacks along the acceptance direction, and conduct experiments to validate the above hypothesis using the proposed objective. We hope this study provides new insights into understanding how LLMs understand harmfulness information.

2023

pdf bib abs

Large-scale similarity search with Optimal Transport
Cléa Laouar | Yuki Takezawa | Makoto Yamada
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Wasserstein distance is a powerful tool for comparing probability distributions and is widely used for document classification and retrieval tasks in NLP. In particular, it is known as the word mover’s distance (WMD) in the NLP community. WMD exhibits excellent performance for various NLP tasks; however, one of its limitations is its computational cost and thus is not useful for large-scale distribution comparisons. In this study, we propose a simple and effective nearest neighbor search based on the Wasserstein distance. Specifically, we employ the L1 embedding method based on the tree-based Wasserstein approximation and subsequently used the nearest neighbor search to efficiently find the k-nearest neighbors. Through benchmark experiments, we demonstrate that the proposed approximation has comparable performance to the vanilla Wasserstein distance and can be computed three orders of magnitude faster than the vanilla Wasserstein distance.

pdf bib abs

A linear time approximation of Wasserstein distance with word embedding selection
Sho Otao | Makoto Yamada
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Wasserstein distance, which can be computed by solving the optimal transport problem, is a powerful method for measuring the dissimilarity between documents. In the NLP community, it is referred to as word mover’s distance (WMD). One of the key challenges of Wasserstein distance is its computational cost since it needs cubic time. Although the Sinkhorn algorithm is a powerful tool to speed up to compute the Wasserstein distance, it still requires square time. Recently, a linear time approximation of the Wasserstein distance including the sliced Wasserstein and the tree-Wasserstein distance (TWD) has been proposed. However, a linear time approximation method suffers when the dimensionality of word vectors is high. In this study, we propose a method to combine feature selection and tree approximation of Wasserstein distance to handle high-dimensional problems. More specifically, we use multiple word embeddings and automatically select useful word embeddings in a tree approximation of Wasserstein distance. To this end, we approximate Wasserstein distance for each word vector by tree approximation technique, and select the discriminative (i.e., large Wasserstein distance) word embeddings by solving an entropic regularized maximization problem. Through our experiments on document classification, our proposed method achieved high performance.

2021

pdf bib abs

Computationally Efficient Wasserstein Loss for Structured Labels
Ayato Toyokuni | Sho Yokoi | Hisashi Kashima | Makoto Yamada
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

The problem of estimating the probability distribution of labels has been widely studied as a label distribution learning (LDL) problem, whose applications include age estimation, emotion analysis, and semantic segmentation. We propose a tree-Wasserstein distance regularized LDL algorithm, focusing on hierarchical text classification tasks. We propose predicting the entire label hierarchy using neural networks, where the similarity between predicted and true labels is measured using the tree-Wasserstein distance. Through experiments using synthetic and real-world datasets, we demonstrate that the proposed method successfully considers the structure of labels during training, and it compares favorably with the Sinkhorn algorithm in terms of computation time and memory usage.

2019

pdf bib abs

Transformer Dissection: An Unified Understanding for Transformer’s Attention via the Lens of Kernel
Yao-Hung Hubert Tsai | Shaojie Bai | Makoto Yamada | Louis-Philippe Morency | Ruslan Salakhutdinov
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Transformer is a powerful architecture that achieves superior performance on various sequence learning tasks, including neural machine translation, language understanding, and sequence prediction. At the core of the Transformer is the attention mechanism, which concurrently processes all inputs in the streams. In this paper, we present a new formulation of attention via the lens of the kernel. To be more precise, we realize that the attention can be seen as applying kernel smoother over the inputs with the kernel scores being the similarities between inputs. This new formulation gives us a better way to understand individual components of the Transformer’s attention, such as the better way to integrate the positional embedding. Another important advantage of our kernel-based formulation is that it paves the way to a larger space of composing Transformer’s attention. As an example, we propose a new variant of Transformer’s attention which models the input as a product of symmetric kernels. This approach achieves competitive performance to the current state of the art model with less computation. In our experiments, we empirically study different kernel construction strategies on two widely used tasks: neural machine translation and sequence prediction.

2018

pdf bib abs

Learning Unsupervised Word Translations Without Adversaries
Tanmoy Mukherjee | Makoto Yamada | Timothy Hospedales
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Word translation, or bilingual dictionary induction, is an important capability that impacts many multilingual language processing tasks. Recent research has shown that word translation can be achieved in an unsupervised manner, without parallel seed dictionaries or aligned corpora. However, state of the art methods unsupervised bilingual dictionary induction are based on generative adversarial models, and as such suffer from their well known problems of instability and hyper-parameter sensitivity. We present a statistical dependency-based approach to bilingual dictionary induction that is unsupervised – no seed dictionary or parallel corpora required; and introduces no adversary – therefore being much easier to train. Our method performs comparably to adversarial alternatives and outperforms prior non-adversarial methods.