Quotation Retrieval System for Bulgarian Media Content

Svetla Koeva, Ivelina Stoyanova, Martin Yalamov


Abstract
This paper presents a method for automatic retrieval and attribution of quotations from media texts in Bulgarian. It involves recognition of report verbs (including their analytical forms) and syntactic patterns introducing quotations, as well as source attribution of the quote by identification of personal names, descriptors, and anaphora. The method is implemented in a fully-functional online system which offers a live service processing media content and extracting quotations on a daily basis. The system collects and processes written news texts from six Bulgarian media websites. The results are presented in a structured way with description, as well as sorting and filtering functionalities which facilitate the monitoring and analysis of media content. The method has been applied to extract quotations from English texts as well and can be adapted to work with other languages, provided that the respective language specific resources are supplied.
Anthology ID:
2016.clib-1.8
Volume:
Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016)
Month:
September
Year:
2016
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
64–73
Language:
URL:
https://aclanthology.org/2016.clib-1.8
DOI:
Bibkey:
Cite (ACL):
Svetla Koeva, Ivelina Stoyanova, and Martin Yalamov. 2016. Quotation Retrieval System for Bulgarian Media Content. In Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016), pages 64–73, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
Quotation Retrieval System for Bulgarian Media Content (Koeva et al., CLIB 2016)
Copy Citation:
PDF:
https://aclanthology.org/2016.clib-1.8.pdf