MuLVE, A Multi-Language Vocabulary Evaluation Data Set

Anik Jacobsen; Salar Mohtaj; Sebastian Möller

MuLVE, A Multi-Language Vocabulary Evaluation Data Set

Anik Jacobsen, Salar Mohtaj, Sebastian Möller

Abstract

Vocabulary learning is vital to foreign language learning. Correct and adequate feedback is essential to successful and satisfying vocabulary training. However, many vocabulary and language evaluation systems perform on simple rules and do not account for real-life user learning data. This work introduces Multi-Language Vocabulary Evaluation Data Set (MuLVE), a data set consisting of vocabulary cards and real-life user answers, labeled indicating whether the user answer is correct or incorrect. The data source is user learning data from the Phase6 vocabulary trainer. The data set contains vocabulary questions in German and English, Spanish, and French as target language and is available in four different variations regarding pre-processing and deduplication. We experiment to fine-tune pre-trained BERT language models on the downstream task of vocabulary evaluation with the proposed MuLVE data set. The results provide outstanding results of > 95.5 accuracy and F2-score. The data set is available on the European Language Grid.

Anthology ID:: 2022.lrec-1.70
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 673–679
Language:
URL:: https://aclanthology.org/2022.lrec-1.70/
DOI:
Bibkey:
Cite (ACL):: Anik Jacobsen, Salar Mohtaj, and Sebastian Möller. 2022. MuLVE, A Multi-Language Vocabulary Evaluation Data Set. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 673–679, Marseille, France. European Language Resources Association.
Cite (Informal):: MuLVE, A Multi-Language Vocabulary Evaluation Data Set (Jacobsen et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.70.pdf

PDF Cite Search Fix data