Impact of ASR Transcriptions on French Spoken Coreference Resolution

Kirill Milintsevich


Abstract
This study introduces a new ASR-transcribed coreference corpus for French and explores the transferability of coreference resolution models from human-transcribed to ASR-transcribed data. Given the challenges posed by differences in text characteristics and errors introduced by ASR systems, we evaluate model performance using newly constructed parallel human-ASR silver training and gold validation datasets. Our findings show a decline in performance on ASR data for models trained on manual transcriptions. However, combining silver ASR data with gold manual data enhances model robustness. Through detailed error analysis, we observe that models emphasizing recall are more resilient to ASR-induced errors compared to those focusing on precision. The resulting ASR corpus, along with all related materials, is freely available under the CC BY-NC-SA 4.0 license at: https://github.com/ina-foss/french-asr-coreference.
Anthology ID:
2025.crac-1.8
Volume:
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference
Month:
November
Year:
2025
Address:
Suzhou, China
Editors:
Maciej Ogrodniczuk, Michal Novak, Massimo Poesio, Sameer Pradhan, Vincent Ng
Venue:
CRAC
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
85–94
Language:
URL:
https://aclanthology.org/2025.crac-1.8/
DOI:
Bibkey:
Cite (ACL):
Kirill Milintsevich. 2025. Impact of ASR Transcriptions on French Spoken Coreference Resolution. In Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference, pages 85–94, Suzhou, China. Association for Computational Linguistics.
Cite (Informal):
Impact of ASR Transcriptions on French Spoken Coreference Resolution (Milintsevich, CRAC 2025)
Copy Citation:
PDF:
https://aclanthology.org/2025.crac-1.8.pdf