Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach

Maciej Janicki, Antti Kanner, Eetu Mäkelä


Abstract
We approach the problem of recognition and attribution of quotes in Finnish news media. Solving this task would create possibilities for large-scale analysis of media wrt. the presence and styles of presentation of different voices and opinions. We describe the annotation of a corpus of media texts, numbering around 1500 articles, with quote attribution and coreference information. Further, we compare two methods for automatic quote recognition: a rule-based one operating on dependency trees and a machine learning one built on top of the BERT language model. We conclude that BERT provides more promising results even with little training data, achieving 95% F-score on direct quote recognition and 84% for indirect quotes. Finally, we discuss open problems and further associated tasks, especially the necessity of resolving speaker mentions to entity references.
Anthology ID:
2023.nodalida-1.6
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
52–59
Language:
URL:
https://aclanthology.org/2023.nodalida-1.6
DOI:
Bibkey:
Cite (ACL):
Maciej Janicki, Antti Kanner, and Eetu Mäkelä. 2023. Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 52–59, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Detection and attribution of quotes in Finnish news media: BERT vs. rule-based approach (Janicki et al., NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.6.pdf