Xinrui Hu


2025

pdf bib
Seeing the Same Story Differently: Framing‐Divergent Event Coreference for Computational Framing Analysis
Jin Zhao | Xinrui Hu | Nianwen Xue
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

News articles often describe the same real-world event in strikingly different ways, shaping perception through framing rather than factual disagreement. However, traditional computational framing approaches often rely on coarse-grained topic classification, limiting their ability to capture subtle, event-level differences in how the same occurrences are presented across sources. We introduce Framing-divergent Event Coreference (FrECo), a novel task that identifies pairs of event mentions referring to the same underlying occurrence but differing in framing across documents to provide a event-centric lens for computational framing analysis. To support this task, we construct the high-agreement and diverse FrECo corpus. We evaluate the FrECo task on the corpus through supervised and preference-based tuning of large language models, providing strong baseline performance. To scale beyond the annotated data, we develop a bootstrapped mining pipeline that iteratively expands the training set with high-confidence FrECo pairs. Our approach enables scalable, interpretable analysis of how media frame the same events differently, offering a new lens for contrastive framing analysis at the event level.

pdf bib
Beyond Benchmarks: Building a Richer Cross-Document Event Coreference Dataset with Decontextualization
Jin Zhao | Jingxuan Tu | Bingyang Ye | Xinrui Hu | Nianwen Xue | James Pustejovsky
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Cross-Document Event Coreference (CDEC) annotation is challenging and difficult to scale, resulting in existing datasets being small and lacking diversity. We introduce a new approach leveraging large language models (LLMs) to decontextualize event mentions, by simplifying the document-level annotation task to sentence pairs with enriched context, enabling the creation of Richer EventCorefBank (RECB), a denser and more expressive dataset annotated at faster speed. Decontextualization has been shown to improve annotation speed without compromising quality and to enhance model performance. Our baseline experiment indicates that systems trained on RECB achieve comparable results on the EventCorefBank(ECB+) test set, showing the high quality of our dataset and its generalizability on other CDEC datasets. In addition, our evaluation shows that the strong baseline models are still struggling with RECB comparing to other CDEC datasets, suggesting that the richness and diversity of RECB present significant challenges to current CDEC systems.