Amandine Dumont
2024
Generating Contexts for ESP Vocabulary Exercises with LLMs
Iglika Nikolova-Stoupak
|
Serge Bibauw
|
Amandine Dumont
|
Françoise Stas
|
Patrick Watrin
|
Thomas François
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning
LLM-Generated Contexts to Practice Specialised Vocabulary: Corpus Presentation and Comparison
Iglika Nikolova-Stoupak
|
Serge Bibauw
|
Amandine Dumont
|
Françoise Stas
|
Patrick Watrin
|
Thomas François
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position
This project evaluates the potential of LLM and dynamic corpora to generate contexts ai- med at the practice and acquisition of specialised English vocabulary. We compared reference contexts—handpicked by expert teachers—for a specialised vocabulary list to contexts generated by three recent large language models (LLM) of different sizes (Mistral-7B-Instruct, Vicuna-13B, and Gemini 1.0 Pro) and to contexts extracted from articles web-crawled from specialised websites. The comparison uses a representative set of length-based, morphosyntactic, semantic, and discourse- related textual characteristics. We conclude that the LLM-based corpora can be combined effectively with a web-crawled one to form an academic corpus characterised by appropriate complexity and textual variety.
Search