European Union Language Resources in Sketch Engine

Vít Baisa, Jan Michelfeit, Marek Medveď, Miloš Jakubíček


Abstract
Several parallel corpora built from European Union language resources are presented here. They were processed by state-of-the-art tools and made available for researchers in the corpus manager Sketch Engine. A completely new resource is introduced: EUR-Lex Corpus, being one of the largest parallel corpus available at the moment, containing 840 million English tokens and the largest language pair English-French has more than 25 million aligned segments (paragraphs).
Anthology ID:
L16-1445
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2799–2803
Language:
URL:
https://aclanthology.org/L16-1445
DOI:
Bibkey:
Cite (ACL):
Vít Baisa, Jan Michelfeit, Marek Medveď, and Miloš Jakubíček. 2016. European Union Language Resources in Sketch Engine. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2799–2803, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
European Union Language Resources in Sketch Engine (Baisa et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1445.pdf