Quotation Detection and Classification with a Corpus-Agnostic Model

Sean Papay, Sebastian Padó


Abstract
The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.
Anthology ID:
R19-1103
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)
Month:
September
Year:
2019
Address:
Varna, Bulgaria
Editors:
Ruslan Mitkov, Galia Angelova
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
888–894
Language:
URL:
https://aclanthology.org/R19-1103
DOI:
10.26615/978-954-452-056-4_103
Bibkey:
Cite (ACL):
Sean Papay and Sebastian Padó. 2019. Quotation Detection and Classification with a Corpus-Agnostic Model. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019), pages 888–894, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Quotation Detection and Classification with a Corpus-Agnostic Model (Papay & Padó, RANLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/R19-1103.pdf