Yukiko Ishizuki


2025

pdf bib
Evaluating Model Alignment with Human Perception: A Study on Shitsukan in LLMs and LVLMs
Daiki Shiono | Ana Brassard | Yukiko Ishizuki | Jun Suzuki
Proceedings of the 31st International Conference on Computational Linguistics

We evaluate the alignment of large language models (LLMs) and large vision-language models (LVLMs) with human perception, focusing on the Japanese concept of *shitsukan*, which reflects the sensory experience of perceiving objects. We created a dataset of *shitsukan* terms elicited from individuals in response to object images. With it, we designed benchmark tasks for three dimensions of understanding *shitsukan*: (1) accurate perception in object images, (2) commonsense knowledge of typical *shitsukan* terms for objects, and (3) distinction of valid *shitsukan* terms. Models demonstrated mixed accuracy across benchmark tasks, with limited overlap between model- and human-generated terms. However, manual evaluations revealed that the model-generated terms were still natural to humans. This work identifies gaps in culture-specific understanding and contributes to aligning models with human sensory perception. We publicly release the dataset to encourage further research in this area.

2024

pdf bib
To Drop or Not to Drop? Predicting Argument Ellipsis Judgments: A Case Study in Japanese
Yukiko Ishizuki | Tatsuki Kuribayashi | Yuichiroh Matsubayashi | Ryohei Sasano | Kentaro Inui
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Speakers sometimes omit certain arguments of a predicate in a sentence; such omission is especially frequent in pro-drop languages. This study addresses a question about ellipsis—what can explain the native speakers’ ellipsis decisions?—motivated by the interest in human discourse processing and writing assistance for this choice. To this end, we first collect large-scale human annotations of whether and why a particular argument should be omitted across over 2,000 data points in the balanced corpus of Japanese, a prototypical pro-drop language. The data indicate that native speakers overall share common criteria for such judgments and further clarify their quantitative characteristics, e.g., the distribution of related linguistic factors in the balanced corpus. Furthermore, the performance of the language model–based argument ellipsis judgment model is examined, and the gap between the systems’ prediction and human judgments in specific linguistic aspects is revealed. We hope our fundamental resource encourages further studies on natural human ellipsis judgment.