Serge Bibauw


2024

pdf bib
Generating Contexts for ESP Vocabulary Exercises with LLMs
Iglika Nikolova-Stoupak | Serge Bibauw | Amandine Dumont | Françoise Stas | Patrick Watrin | Thomas François
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning

pdf bib
LLM-Generated Contexts to Practice Specialised Vocabulary: Corpus Presentation and Comparison
Iglika Nikolova-Stoupak | Serge Bibauw | Amandine Dumont | Françoise Stas | Patrick Watrin | Thomas François
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

This project evaluates the potential of LLM and dynamic corpora to generate contexts ai- med at the practice and acquisition of specialised English vocabulary. We compared reference contexts—handpicked by expert teachers—for a specialised vocabulary list to contexts generated by three recent large language models (LLM) of different sizes (Mistral-7B-Instruct, Vicuna-13B, and Gemini 1.0 Pro) and to contexts extracted from articles web-crawled from specialised websites. The comparison uses a representative set of length-based, morphosyntactic, semantic, and discourse- related textual characteristics. We conclude that the LLM-based corpora can be combined effectively with a web-crawled one to form an academic corpus characterised by appropriate complexity and textual variety.

2023

pdf bib
The BEA 2023 Shared Task on Generating AI Teacher Responses in Educational Dialogues
Anaïs Tack | Ekaterina Kochmar | Zheng Yuan | Serge Bibauw | Chris Piech
Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)

This paper describes the results of the first shared task on generation of teacher responses in educational dialogues. The goal of the task was to benchmark the ability of generative language models to act as AI teachers, replying to a student in a teacher-student dialogue. Eight teams participated in the competition hosted on CodaLab and experimented with a wide variety of state-of-the-art models, including Alpaca, Bloom, DialoGPT, DistilGPT-2, Flan-T5, GPT- 2, GPT-3, GPT-4, LLaMA, OPT-2.7B, and T5- base. Their submissions were automatically scored using BERTScore and DialogRPT metrics, and the top three among them were further manually evaluated in terms of pedagogical ability based on Tack and Piech (2022). The NAISTeacher system, which ranked first in both automated and human evaluation, generated responses with GPT-3.5 Turbo using an ensemble of prompts and DialogRPT-based ranking of responses for given dialogue contexts. Despite promising achievements of the participating teams, the results also highlight the need for evaluation metrics better suited to educational contexts.

2017

pdf bib
Not All Dialogues are Created Equal: Instance Weighting for Neural Conversational Models
Pierre Lison | Serge Bibauw
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Neural conversational models require substantial amounts of dialogue data to estimate their parameters and are therefore usually learned on large corpora such as chat forums or movie subtitles. These corpora are, however, often challenging to work with, notably due to their frequent lack of turn segmentation and the presence of multiple references external to the dialogue itself. This paper shows that these challenges can be mitigated by adding a weighting model into the architecture. The weighting model, which is itself estimated from dialogue data, associates each training example to a numerical weight that reflects its intrinsic quality for dialogue modelling. At training time, these sample weights are included into the empirical loss to be minimised. Evaluation results on retrieval-based models trained on movie and TV subtitles demonstrate that the inclusion of such a weighting model improves the model performance on unsupervised metrics.