Jaroslava Hlaváčová
Also published as: Jaroslava Hlavacova
2024
Charles Translator: A Machine Translation System between Ukrainian and Czech
Martin Popel | Lucie Polakova | Michal Novák | Jindřich Helcl | Jindřich Libovický | Pavel Straňák | Tomas Krabac | Jaroslava Hlavacova | Mariia Anisimova | Tereza Chlanova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Martin Popel | Lucie Polakova | Michal Novák | Jindřich Helcl | Jindřich Libovický | Pavel Straňák | Tomas Krabac | Jaroslava Hlavacova | Mariia Anisimova | Tereza Chlanova
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society. The system was developed in the spring of 2022 with the help of many language data providers in order to quickly meet the demand for such a service, which was not available at the time in the required quality. The translator was later implemented as an online web interface and as an Android app with speech input, both featuring Cyrillic-Latin script transliteration. The system translates directly, in comparison to other available systems that use English as a pivot, and thus makes advantage of the typological similarity of the two languages. It uses the block back-translation method which allows for efficient use of monolingual training data. The paper describes the development process including data collection and implementation, evaluation, mentions several use cases and outlines possibilities for further development of the system for educational purposes.
2020
Prague Dependency Treebank - Consolidated 1.0
Jan Hajič | Eduard Bejček | Jaroslava Hlavacova | Marie Mikulová | Milan Straka | Jan Štěpánek | Barbora Štěpánková
Proceedings of the Twelfth Language Resources and Evaluation Conference
Jan Hajič | Eduard Bejček | Jaroslava Hlavacova | Marie Mikulová | Milan Straka | Jan Štěpánek | Barbora Štěpánková
Proceedings of the Twelfth Language Resources and Evaluation Conference
We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1.0 (PDT-C 1.0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research. PDT-C 1.0 contains four different datasets of Czech, uniformly annotated using the standard PDT scheme (albeit not everything is annotated manually, as we describe in detail here). The texts come from different sources: daily newspaper articles, Czech translation of the Wall Street Journal, transcribed dialogs and a small amount of user-generated, short, often non-standard language segments typed into a web translator. Altogether, the treebank contains around 180,000 sentences with their morphological, surface and deep syntactic annotation. The diversity of the texts and annotations should serve well the NLP applications as well as it is an invaluable resource for linguistic research, including comparative studies regarding texts of different genres. The corpus is publicly and freely available.
2017
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
Daniel Zeman | Martin Popel | Milan Straka | Jan Hajič | Joakim Nivre | Filip Ginter | Juhani Luotolahti | Sampo Pyysalo | Slav Petrov | Martin Potthast | Francis Tyers | Elena Badmaeva | Memduh Gokirmak | Anna Nedoluzhko | Silvie Cinková | Jan Hajič jr. | Jaroslava Hlaváčová | Václava Kettnerová | Zdeňka Urešová | Jenna Kanerva | Stina Ojala | Anna Missilä | Christopher D. Manning | Sebastian Schuster | Siva Reddy | Dima Taji | Nizar Habash | Herman Leung | Marie-Catherine de Marneffe | Manuela Sanguinetti | Maria Simi | Hiroshi Kanayama | Valeria de Paiva | Kira Droganova | Héctor Martínez Alonso | Çağrı Çöltekin | Umut Sulubacak | Hans Uszkoreit | Vivien Macketanz | Aljoscha Burchardt | Kim Harris | Katrin Marheinecke | Georg Rehm | Tolga Kayadelen | Mohammed Attia | Ali Elkahky | Zhuoran Yu | Emily Pitler | Saran Lertpradit | Michael Mandl | Jesse Kirchner | Hector Fernandez Alcalde | Jana Strnadová | Esha Banerjee | Ruli Manurung | Antonio Stella | Atsuko Shimada | Sookyoung Kwak | Gustavo Mendonça | Tatiana Lando | Rattima Nitisaroj | Josie Li
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, the task was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe how the data sets were prepared, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.
2014
Machine Translation of Medical Texts in the Khresmoi Project
Ondřej Dušek | Jan Hajič | Jaroslava Hlaváčová | Michal Novák | Pavel Pecina | Rudolf Rosa | Aleš Tamchyna | Zdeňka Urešová | Daniel Zeman
Proceedings of the Ninth Workshop on Statistical Machine Translation
Ondřej Dušek | Jan Hajič | Jaroslava Hlaváčová | Michal Novák | Pavel Pecina | Rudolf Rosa | Aleš Tamchyna | Zdeňka Urešová | Daniel Zeman
Proceedings of the Ninth Workshop on Statistical Machine Translation
2006
New Approach to Frequency Dictionaries - Czech Example
Jaroslava Hlaváčová
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Jaroslava Hlaváčová
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
On the example of the recent edition of the Frequency Dictionary of Czech wedescribe and explain some new general principles that should be followed forgetting better results for practical uses of frequency dictionaries. It ismainly adopting average reduced frequency instead of absolute frequency forordering items. The formula for calculation of the average reduced frequencyis presented in the contribution together with a brief explanation, including examples clarifying the difference between the measures. Then, the Frequency Dictionary of Czech and its parts are described.
2004
Derivational Relations in Flectional Languages - Czech Case
Jaroslava Hlaváčová | Jana Klímová
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
Jaroslava Hlaváčová | Jana Klímová
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
When a text in any language is submitted to a morphological analysis, there always rest some unrecognized words. We can lower their number by adding new words into the dictionary used by the morphological analyzer but we can never gather the whole of the language. The system described in this paper (we call it "derivation module") deals with the unknown derived words. It aims not only at analyzing but also at synthesizing Czech derived words. Such a system is of particular value for automatic processing of languages where derivational morphology plays an important role in regular word formation.
2000
Search
Fix author
Co-authors
- Jan Hajic 3
- Michal Novák 2
- Martin Popel 2
- Milan Straka 2
- Zdenka Uresova 2
- Daniel Zeman 2
- Hector Fernandez Alcalde 1
- Mariia Anisimova 1
- Mohammed Attia 1
- Elena Badmaeva 1
- Esha Banerjee 1
- Eduard Bejček 1
- Aljoscha Burchardt 1
- Tereza Chlanova 1
- Silvie Cinková 1
- Kira Droganova 1
- Ondřej Dušek 1
- Ali Elkahky 1
- Filip Ginter 1
- Memduh Gökırmak 1
- Nizar Habash 1
- Jan Hajič jr. 1
- Kim Harris 1
- Jindřich Helcl 1
- Hiroshi Kanayama 1
- Jenna Kanerva 1
- Tolga Kayadelen 1
- Václava Kettnerová 1
- Jesse Kirchner 1
- Jana Klímová 1
- Tomas Krabac 1
- Sookyoung Kwak 1
- Tatiana Lando 1
- Saran Lertpradit 1
- Herman Leung 1
- Josie Li 1
- Jindřich Libovický 1
- Juhani Luotolahti 1
- Vivien Macketanz 1
- Michael Mandel 1
- Christopher D. Manning 1
- Ruli Manurung 1
- Katrin Marheinecke 1
- Héctor Martínez Alonso 1
- Gustavo Mendonca 1
- Marie Mikulová 1
- Anna Missilä 1
- Anna Nedoluzhko 1
- Rattima Nitisaroj 1
- Joakim Nivre 1
- Stina Ojala 1
- Pavel Pecina 1
- Slav Petrov 1
- Emily Pitler 1
- Lucie Poláková 1
- Martin Potthast 1
- Sampo Pyysalo 1
- Siva Reddy 1
- Georg Rehm 1
- Rudolf Rosa 1
- Manuela Sanguinetti 1
- Sebastian Schuster 1
- Atsuko Shimada 1
- Maria Simi 1
- Antonio Stella 1
- Pavel Straňák 1
- Jana Strnadová 1
- Umut Sulubacak 1
- Dima Taji 1
- Aleš Tamchyna 1
- Francis Tyers 1
- Hans Uszkoreit 1
- Zhuoran Yu 1
- Marie-Catherine de Marneffe 1
- Valeria de Paiva 1
- Çağrı Çöltekin 1
- Jan Štěpánek 1
- Barbora Štěpánková 1