The F1 of Formula One: Applicability of Pre-trained NER Models to Brazilian TV Interview Transcripts

João Pedro Gonçalves Munhoz, Luiz Felipe Guidorizzi de Oliveira, Isabella Belchior, Evandro Eduardo Seron Ruiz, Oto Araújo Vale


Abstract
Recorded interviews can capture their subjects’ memories, perceptions, and emotions. When conducted with notable figures, they also have the potential to serve as a resource for interdisciplinary research, impacting various branches of science. In this work, we mark the beginning of a significant project analyzing interviews from the Roda Viva program, the longest-running interview show on Brazilian television. In this initial study, we examined six memorable interviews with six Brazilian Formula One drivers to compare the performance of two named entity recognition methods: a statistical-neural method and large language models, both evaluated against manual annotations. Still, it highlighted relevant qualitative distinctions: the statistical method showed a rigid dependence on capitalisation and lexical familiarity, leading to mechanical false positives and missing non-capitalised entities, while the LLM exhibited greater linguistic sensitivity, retrieving contextual entities and being robust to transcription errors, though it still produces false positives. The LLM-based model appears more promising due to its flexibility and the potential for refinement via instructions to filter for ambiguities, favouring the automation of social network extraction in the corpus.
Anthology ID:
2026.propor-2.40
Volume:
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Month:
April
Year:
2026
Address:
Salvador, Brazil
Editors:
Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:
PROPOR
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
303–311
Language:
URL:
https://aclanthology.org/2026.propor-2.40/
DOI:
Bibkey:
Cite (ACL):
João Pedro Gonçalves Munhoz, Luiz Felipe Guidorizzi de Oliveira, Isabella Belchior, Evandro Eduardo Seron Ruiz, and Oto Araújo Vale. 2026. The F1 of Formula One: Applicability of Pre-trained NER Models to Brazilian TV Interview Transcripts. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2, pages 303–311, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):
The F1 of Formula One: Applicability of Pre-trained NER Models to Brazilian TV Interview Transcripts (Munhoz et al., PROPOR 2026)
Copy Citation:
PDF:
https://aclanthology.org/2026.propor-2.40.pdf