Constraint Based Description of Polish Multiword Expressions

Roman Kurc, Maciej Piasecki, Bartosz Broda


Abstract
We present an approach to the description of Polish Multi-word Expressions (MWEs) which is based on expressions in the WCCL language of morpho-syntactic constraints instead of grammar rules or transducers. For each MWE its basic morphological form and the base forms of its constituents are specified but also each MWE is assigned to a class on the basis of its syntactic structure. For each class a WCCL constraint is defined which is parametrised by string variables referring to MWE constituent base forms or inflected forms. The constraint specifies a minimal set of conditions that must be fulfilled in order to recognise an occurrence of the given MWE in text with high accuracy. Our formalism is focused on the efficient description of large MWE lexicons for the needs of utilisation in text processing. The formalism allows for the relatively easy representation of flexible word order and discontinuous constructions. Moreover, there is no necessity for the full specification of the MWE grammatical structure. Only some aspects of the particular MWE structure can be selected in way facilitating the target accuracy of recognition. On the basis of a set of simple heuristics, WCCL-based representation of MWEs can be automatically generated from a list of MWE base forms. The proposed representation was applied on a practical scale for the description of a large set of Polish MWEs included in plWordNet.
Anthology ID:
L12-1613
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2408–2413
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1027_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Roman Kurc, Maciej Piasecki, and Bartosz Broda. 2012. Constraint Based Description of Polish Multiword Expressions. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 2408–2413, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Constraint Based Description of Polish Multiword Expressions (Kurc et al., LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/1027_Paper.pdf