2023
pdf
bib
Assessing the featural organisation of paradigms with distributional methods
Olivier Bonami
|
Lukáš Kyjánek
|
Marine Wauquier
Proceedings of the Society for Computation in Linguistics 2023
2022
pdf
bib
abs
Towards Universal Segmentations: UniSegments 1.0
Zdeněk Žabokrtský
|
Niyati Bafna
|
Jan Bodnár
|
Lukáš Kyjánek
|
Emil Svoboda
|
Magda Ševčíková
|
Jonáš Vidra
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Our work aims at developing a multilingual data resource for morphological segmentation. We present a survey of 17 existing data resources relevant for segmentation in 32 languages, and analyze diversity of how individual linguistic phenomena are captured across them. Inspired by the success of Universal Dependencies, we propose a harmonized scheme for segmentation representation, and convert the data from the studied resources into this common scheme. Harmonized versions of resources available under free licenses are published as a collection called UniSegments 1.0.
pdf
bib
abs
Constructing a Lexical Resource of Russian Derivational Morphology
Lukáš Kyjánek
|
Olga Lyashevskaya
|
Anna Nedoluzhko
|
Daniil Vodolazsky
|
Zdeněk Žabokrtský
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Words of any language are to some extent related thought the ways they are formed. For instance, the verb ‘exempl-ify’ and the noun ‘example-s’ are both based on the word ‘example’, but the verb is derived from it, while the noun is inflected. In Natural Language Processing of Russian, the inflection is satisfactorily processed; however, there are only a few machine-trackable resources that capture derivations even though Russian has both of these morphological processes very rich. Therefore, we devote this paper to improving one of the methods of constructing such resources and to the application of the method to a Russian lexicon, which results in the creation of the largest lexical resource of Russian derivational relations. The resulting database dubbed DeriNet.RU includes more than 300 thousand lexemes connected with more than 164 thousand binary derivational relations. To create such data, we combined the existing machine-learning methods that we improved to manage this goal. The whole approach is evaluated on our newly created data set of manual, parallel annotation. The resulting DeriNet.RU is freely available under an open license agreement.
pdf
bib
abs
Web-based Annotation Interface for Derivational Morphology
Lukáš Kyjánek
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: System Demonstrations
The paper presents a visual interface for manual annotation of language resources for derivational morphology. The interface is web-based and created using relatively simple programming techniques, and yet it rapidly facilitates and speeds up the annotation process, especially in languages with rich derivational morphology. As such, it can reduce the cost of the process. After introducing manual annotation tasks in derivational morphology, the paper describes the new visual interface and a case study that compares the current annotation method to the annotation using the interface. In addition, it also demonstrates the opportunity to use the interface for manual annotation of syntactic trees. The source codes are freely available under the MIT License on GitHub.
2019
pdf
bib
DeriNet 2.0: Towards an All-in-One Word-Formation Resource
Jonáš Vidra
|
Zdeněk Žabokrtský
|
Magda Ševčíková
|
Lukáš Kyjánek
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology
pdf
bib
Universal Derivations Kickoff: A Collection of Harmonized Derivational Resources for Eleven Languages
Lukáš Kyjánek
|
Zdeněk Žabokrtský
|
Magda Ševčíková
|
Jonáš Vidra
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology