EENLP: Cross-lingual Eastern European NLP Index

Alexey Tikhonov, Alex Malkhasov, Andrey Manoshin, George-Andrei Dima, Réka Cserháti, Md.Sadek Hossain Asif, Matt Sárdi


Abstract
Motivated by the sparsity of NLP resources for Eastern European languages, we present a broad index of existing Eastern European language resources (90+ datasets and 45+ models) published as a github repository open for updates from the community. Furthermore, to support the evaluation of commonsense reasoning tasks, we provide hand-crafted cross-lingual datasets for five different semantic tasks (namely news categorization, paraphrase detection, Natural Language Inference (NLI) task, tweet sentiment detection, and news sentiment detection) for some of the Eastern European languages. We perform several experiments with the existing multilingual models on these datasets to define the performance baselines and compare them to the existing results for other languages.
Anthology ID:
2022.lrec-1.220
Volume:
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:
June
Year:
2022
Address:
Marseille, France
Venue:
LREC
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
2050–2057
Language:
URL:
https://aclanthology.org/2022.lrec-1.220
DOI:
Bibkey:
Cite (ACL):
Alexey Tikhonov, Alex Malkhasov, Andrey Manoshin, George-Andrei Dima, Réka Cserháti, Md.Sadek Hossain Asif, and Matt Sárdi. 2022. EENLP: Cross-lingual Eastern European NLP Index. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 2050–2057, Marseille, France. European Language Resources Association.
Cite (Informal):
EENLP: Cross-lingual Eastern European NLP Index (Tikhonov et al., LREC 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.lrec-1.220.pdf
Code
 altsoph/EENLP
Data
GLUETaPaCo