Zero-Shot Evaluation of Conversational Language Competence in Data-Efficient LLMs Across English, Mandarin, and French

Sheng-Fu Wang; Ri-Sheng Huang; Shu-Kai Hsieh; Laurent Prévot

Zero-Shot Evaluation of Conversational Language Competence in Data-Efficient LLMs Across English, Mandarin, and French

Sheng-Fu Wang, Ri-Sheng Huang, Shu-Kai Hsieh, Laurent Prévot

Abstract

Large Language Models (LLMs) have achieved oustanding performance across various natural language processing tasks, including those from Discourse and Dialogue traditions. However, these achievements are typically obtained thanks to pretraining on huge datasets. In contrast, humans learn to speak and communicate through dialogue and spontaneous speech with only a fraction of the language exposure. This disparity has spurred interest in evaluating whether smaller, more carefully selected and curated pretraining datasets can support robust performance on specific tasks. Drawing inspiration from the BabyLM initiative, we construct small (10M-token) pretraining datasets from different sources, including conversational transcripts and Wikipedia-style text. To assess the impact of these datasets, we develop evaluation benchmarks focusing on discourse and interactional markers, extracted from high-quality spoken corpora in English, French, and Mandarin. Employing a zero-shot classification framework inspired by the BLiMP benchmark, we design tasks wherein the model must determine, between a genuine utterance extracted from a corpus and its minimally altered counterpart, which one is the authentic instance. Our findings reveal that the nature of pretraining data significantly influences model performance on discourse-related tasks. Models pretrained on conversational data exhibit a clear advantage in handling discourse and interactional markers compared to those trained on written or encyclopedic text. Furthermore, the models, trained on small amount spontaneous speech transcripts, perform comparably to standard LLMs.

Anthology ID:: 2025.sigdial-1.3
Volume:: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Month:: August
Year:: 2025
Address:: Avignon, France
Editors:: Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin
Venue:: SIGDIAL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 32–47
Language:
URL:: https://aclanthology.org/2025.sigdial-1.3/
DOI:
Bibkey:
Cite (ACL):: Sheng-Fu Wang, Ri-Sheng Huang, Shu-Kai Hsieh, and Laurent Prévot. 2025. Zero-Shot Evaluation of Conversational Language Competence in Data-Efficient LLMs Across English, Mandarin, and French. In Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 32–47, Avignon, France. Association for Computational Linguistics.
Cite (Informal):: Zero-Shot Evaluation of Conversational Language Competence in Data-Efficient LLMs Across English, Mandarin, and French (Wang et al., SIGDIAL 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.sigdial-1.3.pdf

PDF Cite Search Fix data