Word-Formation Network for Czech

Magda Ševčíková; Zdeněk Žabokrtský

Word-Formation Network for Czech

Abstract

In the present paper, we describe the development of the lexical network DeriNet, which captures core word-formation relations on the set of around 266 thousand Czech lexemes. The network is currently limited to derivational relations because derivation is the most frequent and most productive word-formation process in Czech. This limitation is reflected in the architecture of the network: each lexeme is allowed to be linked up with just a single base word; composition as well as combined processes (composition with derivation) are thus not included. After a brief summarization of theoretical descriptions of Czech derivation and the state of the art of NLP approaches to Czech derivation, we discuss the linguistic background of the network and introduce the formal structure of the network and the semi-automatic annotation procedure. The network was initialized with a set of lexemes whose existence was supported by corpus evidence. Derivational links were created using three sources of information: links delivered by a tool for morphological analysis, links based on an automatically discovered set of derivation rules, and on a grammar-based set of rules. Finally, we propose some research topics which could profit from the existence of such lexical network.

Anthology ID:: L14-1416
Volume:: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:: May
Year:: 2014
Address:: Reykjavik, Iceland
Editors:: Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:: 1087–1093
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/501_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Magda Ševčíková and Zdeněk Žabokrtský. 2014. Word-Formation Network for Czech. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1087–1093, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):: Word-Formation Network for Czech (Ševčíková & Žabokrtský, LREC 2014)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2014/pdf/501_Paper.pdf

PDF Cite Search Fix data