Semi-Automatic Detection of Multiword Expressions in the Slovak Dependency Treebank

Daniela Majchrakova, Ondrej Dusek, Jan Hajic, Agata Karcova, Radovan Garabik


Abstract
We describe a method for semi-automatic extraction of Slovak multiword expressions (MWEs) from a dependency treebank. The process uses an automatic conversion from dependency syntactic trees to deep syntax and automatic tagging of verbal argument nodes based on a valency dictionary. Both the valency dictionary and the treebank conversion were adapted from the corresponding Czech versions; the automatically translated valency dictionary has been manually proofread and corrected. There are two main achievements – a valency dictionary of Slovak MWEs with direct links to corresponding expressions in the Czech dictionary, PDT-Vallex, and a method of extraction of MWEs from the Slovak Dependency Treebank. The extraction reached very high precision but lower recall in a manual evaluation. This is a work in progress, the overall goal of which is twofold: to create a Slovak language valency dictionary paralleling the Czech one, with bilingual links; and to use the extracted verbal frames in a collocation dictionary of Slovak verbs.
Anthology ID:
2014.clib-1.5
Volume:
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)
Month:
September
Year:
2014
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
32–39
Language:
URL:
https://aclanthology.org/2014.clib-1.5
DOI:
Bibkey:
Cite (ACL):
Daniela Majchrakova, Ondrej Dusek, Jan Hajic, Agata Karcova, and Radovan Garabik. 2014. Semi-Automatic Detection of Multiword Expressions in the Slovak Dependency Treebank. In Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014), pages 32–39, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
Semi-Automatic Detection of Multiword Expressions in the Slovak Dependency Treebank (Majchrakova et al., CLIB 2014)
Copy Citation:
PDF:
https://aclanthology.org/2014.clib-1.5.pdf