SeCoDa: Sense Complexity Dataset

David Strohmaier, Sian Gooding, Shiva Taslimipoor, Ekaterina Kochmar


Abstract
The Sense Complexity Dataset (SeCoDa) provides a corpus that is annotated jointly for complexity and word senses. It thus provides a valuable resource for both word sense disambiguation and the task of complex word identification. The intention is that this dataset will be used to identify complexity at the level of word senses rather than word tokens. For word sense annotation SeCoDa uses a hierarchical scheme that is based on information available in the Cambridge Advanced Learner’s Dictionary. This way we can offer more coarse-grained senses than directly available in WordNet.
Anthology ID:
2020.lrec-1.730
Volume:
Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
5962–5967
Language:
English
URL:
https://aclanthology.org/2020.lrec-1.730
DOI:
Bibkey:
Cite (ACL):
David Strohmaier, Sian Gooding, Shiva Taslimipoor, and Ekaterina Kochmar. 2020. SeCoDa: Sense Complexity Dataset. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 5962–5967, Marseille, France. European Language Resources Association.
Cite (Informal):
SeCoDa: Sense Complexity Dataset (Strohmaier et al., LREC 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.lrec-1.730.pdf