Yiming Ma

2026

JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models’ Detection of Human risky health behavior Content in Jirai Community
Yunze Xiao | Tingyu He | Lionel Z. Wang | Yiming Ma | Xingyu Song | Xiaohang Xu | Mona T. Diab | Irene Li | Ka Chung Ng
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)

In this paper, we present the first cross-lingual dataset that captures a transnational cultural phenomenon, focusing on the Chinese and Japanese "Jirai" subculture and its association with risky health behaviors. Our dataset of more than 15,000 annotated social media posts forms the core of JiraiBench, a benchmark designed to evaluate LLMs on culturally specific content. This unique resource allowed us to uncover an unexpected cross-cultural transfer in which Japanese prompts better handle Chinese content, indicating that cultural context can be more influential than linguistic similarity. Further evidence suggests potential cross-lingual knowledge transfer in fine-tuned models. This work proves the indispensable role of developing culturally informed, cross-lingual datasets for creating effective content moderation tools that can protect vulnerable communities across linguistic borders.

2025

pdf bib abs

LLMs have improved the fluency and informativeness of abstractive summarization but remain prone to hallucinations, where generated content deviates from the source document. Recent PMI decoding strategies mitigate over-reliance on prior knowledge by comparing output probabilities with and without source documents, effectively enhancing contextual utilization and improving faithfulness. However, existing strategies often neglect the explicit use of salient contextual information and rely on static hyperparameters to fix the balance between contextual and prior knowledge, limiting their flexibility. In this work, we propose Salience-Aware Reinforced Adaptive decoding (SARA), which incorporates salient information and allows the model to adaptively determine reliance on the source document’s context, salient context, and the model’s prior knowledge based on pointwise mutual information. Moreover, a tokenwise adaptive decoding mechanism via reinforcement learning is proposed in SARA to dynamically adjust the contributions of context and prior knowledge at each decoding timestep. Experiments on CNN/DM, WikiHow, and NYT50 datasets show that SARA consistently improves the quality and faithfulness of summaries across various LLM backbones without modifying their weights.

Co-authors

Venues

ACL1
EACL1

Fix author