%0 Conference Proceedings %T The MARCELL Legislative Corpus %A Váradi, Tamás %A Koeva, Svetla %A Yamalov, Martin %A Tadić, Marko %A Sass, Bálint %A Nitoń, Bartłomiej %A Ogrodniczuk, Maciej %A Pęzik, Piotr %A Barbu Mititelu, Verginica %A Ion, Radu %A Irimia, Elena %A Mitrofan, Maria %A Păi\textcommabelows, Vasile %A Tufi\textcommabelows, Dan %A Garabík, Radovan %A Krek, Simon %A Repar, Andraz %A Rihtar, Matjaž %A Brank, Janez %Y Calzolari, Nicoletta %Y Béchet, Frédéric %Y Blache, Philippe %Y Choukri, Khalid %Y Cieri, Christopher %Y Declerck, Thierry %Y Goggi, Sara %Y Isahara, Hitoshi %Y Maegaard, Bente %Y Mariani, Joseph %Y Mazo, Hélène %Y Moreno, Asuncion %Y Odijk, Jan %Y Piperidis, Stelios %S Proceedings of the Twelfth Language Resources and Evaluation Conference %D 2020 %8 May %I European Language Resources Association %C Marseille, France %@ 979-10-95546-34-4 %G English %F varadi-etal-2020-marcell %X This article presents the current outcomes of the MARCELL CEF Telecom project aiming to collect and deeply annotate a large comparable corpus of legal documents. The MARCELL corpus includes 7 monolingual sub-corpora (Bulgarian, Croatian, Hungarian, Polish, Romanian, Slovak and Slovenian) containing the total body of respective national legislative documents. These sub-corpora are automatically sentence split, tokenized, lemmatized and morphologically and syntactically annotated. The monolingual sub-corpora are complemented by a thematically related parallel corpus (Croatian-English). The metadata and the annotations are uniformly provided for each language specific sub-corpus. Besides the standard morphosyntactic analysis plus named entity and dependency annotation, the corpus is enriched with the IATE and EUROVOC labels. The file format is CoNLL-U Plus Format, containing the ten columns specific to the CoNLL-U format and four extra columns specific to our corpora. The MARCELL corpora represents a rich and valuable source for further studies and developments in machine learning, cross-lingual terminological data extraction and classification. %U https://aclanthology.org/2020.lrec-1.464 %P 3761-3768