PARSEME Survey on MWE Resources

Gyri Smørdal Losnegaard, Federico Sangati, Carla Parra Escartín, Agata Savary, Sascha Bargmann, Johanna Monti


Abstract
This paper summarizes the preliminary results of an ongoing survey on multiword resources carried out within the IC1207 Cost Action PARSEME (PARSing and Multi-word Expressions). Despite the availability of language resource catalogs and the inventory of multiword datasets on the SIGLEX-MWE website, multiword resources are scattered and difficult to find. In many cases, language resources such as corpora, treebanks, or lexical databases include multiwords as part of their data or take them into account in their annotations. However, these resources need to be centralized to make them accessible. The aim of this survey is to create a portal where researchers can easily find multiword(-aware) language resources for their research. We report on the design of the survey and analyze the data gathered so far. We also discuss the problems we have detected upon examination of the data as well as possible ways of enhancing the survey.
Anthology ID:
L16-1364
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2299–2306
Language:
URL:
https://aclanthology.org/L16-1364
DOI:
Bibkey:
Cite (ACL):
Gyri Smørdal Losnegaard, Federico Sangati, Carla Parra Escartín, Agata Savary, Sascha Bargmann, and Johanna Monti. 2016. PARSEME Survey on MWE Resources. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2299–2306, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
PARSEME Survey on MWE Resources (Losnegaard et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1364.pdf