Rui He

2026

Exploring the Semantic Space of Second Language Learners
Trisha Godara | Rui He | Wolfram Hinzen | Yan Cong
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

While the semantic space has been examined as a way to computationally represent language meaning-grammar interface, minimal research has been done comparing the semantic spaces of first and second language learners. We investigated the semantic space of university-level students learning French by extracting semantic features from narrative text over various time points from a 21-month period. After using machine learning models to classify native speakers’ semantic features from second language learners’, we used interpretability techniques to identify the most informative features per model. Through this, we discovered a variety of embedding similarity features to be decisive in language learning. We compared both groups to determine how the features differed per group and if there was any change over time. The findings demonstrated that the second language learners on average had higher semantic similarity scores than the native speakers at the token level. The similarity decreased over time but did not reach native-level values. Similarly, average surprisal was higher in the second language learner group, which steadily decreased over the course of the data collection period. These results provide insight into personalized education with more precise and effective computational indices tracking learners’ progress.

2025

pdf bib abs

Dialogue text segmentation aims to partition dialogue content into consecutive paragraphs based on themes or logic, enhancing its comprehensibility and manageability. Current text segmentation models, when applied directly to STS (Streaming Text Segmentation), exhibit numerous limitations, such as imbalances in labels that affect the stability of model training, and discrepancies between the model’s training tasks (sentence classification) and the actual text segmentation that limit the model’s segmentation capabilities.To address these challenges, we first implement STS for the first time using a sliding window-based segmentation method. Secondly, we employ two different levels of sliding window-based balanced label strategies to stabilize the training process of the streaming segmentation model and enhance training convergence speed. Finally, by adding a one-dimensional bounding-box regression task for text sequences within the window, we restructure the training approach of STS tasks, shifting from sentence classification to sequence segmentation, thereby aligning the training objectives with the task objectives, which further enhanced the model’s performance. Extensive experimental results demonstrate that our method is robust, controllable, and achieves state-of-the-art performance.

Co-authors

Hua Xu 1

Venues

EACL1
Findings1

Fix author