Jou-An Chi


2025

pdf bib
Structured vs. Unstructured Inputs in LLMs: Evaluating the Semantic and Pragmatic Predictive Power in Abnormal Event Forecasting
Jou-An Chi | Shu-Kai Hsieh
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)

Large Language Models (LLMs) are increasingly applied to temporally grounded reasoning tasks, yet the role of input representation remains unclear. This paper compares structured temporal inputs, represented as Temporal Knowledge Graphs (TKGs), with unstructured captions in two settings: forecasting future events and detecting anomalies in surveillance video descriptions. To enable direct comparison, we build a unified dataset by aligning anomaly labels from UCF-Crime with caption annotations from UCA. Experiments show that unstructured captions consistently yield slightly higher scores across both tasks, but the differences do not reach statistical significance. Their trade-offs, however, differ: captions provide richer semantic cues for generation, while TKGs reduce input length, suppress noise, and enhance interpretability. These findings suggest that action-centric corpora, such as surveillance or forensic narratives, naturally lend themselves to structured representations, which can provide temporal scaffolds for timeline reconstruction and more traceable reasoning. All code, data processing scripts, and experimental results are available at our GitHub repository.

2024

pdf bib
Extending the BabyLM Initiative : Promoting Diversity in Datasets and Metrics through High-Quality Linguistic Corpora
Laurent Prévot | Sheng-Fu Wang | Jou-An Chi | Shu-Kai Hsieh
The 2nd BabyLM Challenge at the 28th Conference on Computational Natural Language Learning

BabyLM paves the way for a range of experiments aimed at better understanding language models (LMs) and the differences and similarities between human and artificial language learning. However, the current framework is limited to the English language and a narrow but significant range of evaluation metrics, primarily focused on syntax, semantics, and pragmatics. In this paper, we propose some steps towards extending the framework to other languages, specifically Mandarin Chinese and French, leveraging existing linguistic resources for these languages. Additionally, we advocate for greater exploration of genre variations within subcorpora for training LMs, as well as for the adoption of additional evaluation metrics with different underlying principles. Our proposal consists of using high-quality spontaneous speech corpora as a source for extracting production-related variables, which the models are then fine-tuned to predict. We hypothesize that these production-related features offer insights into the language processing mechanisms underlying the data and that cognitively sensitive models should outperform others in predicting these features. Specifically, we propose focusing on the prediction of phenomena such as speech reductions, prosodic prominences, sequences co-occurring with listeners’ backchannels, and disfluencies. To illustrate our approach, we present an example involving the prediction of speech reductions in spontaneous speech in two different languages (French and English), using models trained on 10 million tokens from different data source mixtures. Although the results are preliminary, they suggest that this task can characterize models for predicting human language processing.

2023

pdf bib
Evaluating Interfaced LLM Bias
Kai-Ching Yeh | Jou-An Chi | Da-Chen Lian | Shu-Kai Hsieh
Proceedings of the 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023)