TFB at SemEval-2026 Task 4: Diagnosing Model Failures in Narrative Understanding

Anna Colli; Benedictus Kent Rachmat; Eve Sauvage; Delphine Battistelli; Thomas Gerald; Cyril Grouin; Julien Tourille; Zheng Zhang

TFB at SemEval-2026 Task 4: Diagnosing Model Failures in Narrative Understanding

Anna Colli, Benedictus Kent Rachmat, Eve Sauvage, Delphine Battistelli, Thomas Gerald, Cyril Grouin, Julien Tourille, Zheng Zhang

Abstract

We describe the participation of team TFB in SemEval-2026 Task 4 on narrative similarity. We explore ColBERT-inspired sentence-level late interaction to capture event reordering, compare fine-tuning with synthetic data at multiple difficulty tiers, finding that distribution proximity to the target data matters more than volume and evaluate chain-of-thought prompting. We complement our approaches with a human annotation study (Krippendorff’s alpha=0.32) confirming the task’s inherent difficulty, an analysis of synthetic data distribution shift explaining why fine-tuning on out-of-distribution data hurts the model’s performance. Despite our tests, we didn’t surpass results of sentence-t5-xxl on Track B and Qwen2.5-7B on Track A. We finally decided to submit these two models for the task.

Anthology ID:: 2026.semeval-1.367
Volume:: Proceedings of the 20th International Workshop on Semantic Evaluation (2026)
Month:: July
Year:: 2026
Address:: San Diego, California, USA
Editors:: Ekaterina Kochmar, Debanjan Ghosh, Kai North, Mamoru Komachi
Venues:: SemEval | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 2932–2938
Language:
URL:: https://aclanthology.org/2026.semeval-1.367/
DOI:
Bibkey:
Cite (ACL):: Anna Colli, Benedictus Kent Rachmat, Eve Sauvage, Delphine Battistelli, Thomas Gerald, Cyril Grouin, Julien Tourille, and Zheng Zhang. 2026. TFB at SemEval-2026 Task 4: Diagnosing Model Failures in Narrative Understanding. In Proceedings of the 20th International Workshop on Semantic Evaluation (2026), pages 2932–2938, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):: TFB at SemEval-2026 Task 4: Diagnosing Model Failures in Narrative Understanding (Colli et al., SemEval 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.semeval-1.367.pdf

PDF Cite Search Fix data