A Canonical Form for Flexible Multiword Expressions

Jan Odijk, Martin Kroon


Abstract
This paper proposes a canonical form for Multiword Expressions (MWEs), in particular for the Dutch language. The canonical form can be enriched with all kinds of annotations that can be used to describe the properties of the MWE and its components. It also introduces the DUCAME (DUtch CAnonical Multiword Expressions) lexical resource with more than 11k MWEs in canonical form. DUCAME is used in MWE-Finder to automatically generate queries for searching for flexible MWEs in large text corpora.
Anthology ID:
2024.lrec-main.8
Volume:
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:
LREC | COLING
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
91–101
Language:
URL:
https://aclanthology.org/2024.lrec-main.8
DOI:
Bibkey:
Cite (ACL):
Jan Odijk and Martin Kroon. 2024. A Canonical Form for Flexible Multiword Expressions. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 91–101, Torino, Italia. ELRA and ICCL.
Cite (Informal):
A Canonical Form for Flexible Multiword Expressions (Odijk & Kroon, LREC-COLING 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.lrec-main.8.pdf