David Alfter

2025

Towards a Real-time Swedish Speech Analyzer for Language Learning Games: A Hybrid AI Approach to Language Assessment
Tianyi Geng | David Alfter
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

This paper presents an automatic speech assessment system designed for Swedish language learners. We introduce a novel hybrid approach that integrates Microsoft Azure speech services with open-source Large Language Models (LLMs). Our system is implemented as a web-based application that provides real-time quick assessment with a game-like experience. Through testing against COREFL English corpus data and Swedish L2 speech data, our system demonstrates effectiveness in distinguishing different language proficiencies, closely aligning with CEFR levels. This ongoing work addresses the gap in current low-resource language assessment technologies with a pilot system developed for automated speech analysis.

pdf bib abs

The Need for Truly Graded Lexical Complexity Prediction
David Alfter
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Recent trends in NLP have shifted towards modeling lexical complexity as a continuous value, but practical implementations often remain binary. This opinion piece argues for the importance of truly graded lexical complexity prediction, particularly in language learning. We examine the evolution of lexical complexity modeling, highlighting the “data bottleneck” as a key obstacle. Overcoming this challenge can lead to significant benefits, such as enhanced personalization in language learning and improved text simplification. We call for a concerted effort from the research community to create high-quality, graded complexity datasets and to develop methods that fully leverage continuous complexity modeling, while addressing ethical considerations. By fully embracing the continuous nature of lexical complexity, we can develop more effective, inclusive, and personalized language technologies.

pdf bib abs

GRASP at CoMeDi Shared Task: Multi-Strategy Modeling of Annotator Behavior in Multi-Lingual Semantic Judgments
David Alfter | Mattias Appelgren
Proceedings of Context and Meaning: Navigating Disagreements in NLP Annotation

This paper presents the GRASP team’s systems for the CoMeDi 2025 shared task on disagreement prediction in semantic annotation. The task comprises two subtasks: predicting median similarity scores and mean disagreement scores for word usage across multiple languages including Chinese, English, German, Norwegian, Russian, Spanish, and Swedish. For subtask 1, we implement three approaches: Prochain, a probabilistic chain model predicting sequential judgments; FARM, an ensemble of five fine-tuned XLM-RoBERTa models; and THAT, a task-specific model using XL-Lexeme with adaptive thresholds. For subtask 2, we develop three systems: LAMP, combining language-agnostic and monolingual models; BUMBLE, using optimal language combinations; and DRAMA, leveraging disagreement patterns from FARM’s outputs. Our results show strong performance across both subtasks, ranking second overall among participating teams. The probabilistic Prochain model demonstrates surprisingly robust performance when given accurate initial judgments, while our task-specific approaches show varying effectiveness across languages.

pdf bib abs

Annotating Personal Information in Swedish Texts with SPARV
Maria Irena Szawerna | David Alfter | Elena Volodina
Proceedings of the First on Natural Language Processing and Language Models for Digital Humanities

Digital Humanities (DH) research, among many others, relies on data, a subset of which comes in the form of language data that contains personal information (PI). Working with and sharing such data has ethical and legal implications. The process of removing (anonymization) or replacing (pseudonymization) of personal information in texts may be used to address these issues, and often begins with a PI detection and labeling stage. We present a new tool for personal information detection and labeling for Swedish, SBX-PI-DETECTION (henceforth SBX-PI), alongside a visualization interface, (IM)PERSONAL DATA, which allows for the comparison of outputs from different tools. A valuable feature of SBX-PI is that it enables the users to run the annotation locally. It is also integrated into the text annotation pipeline SPARV, allowing for other types of annotation to be performed simultaneously and contributing to the privacy by design requirement set by the GDPR. A novel feature of (IM)PERSONAL DATA is that it allows researchers to assess the extent of detected PI in a text and how much of it will be manipulated once anonymization or pseudonymization are applied. The tools are primarily aimed at researchers within Digital Humanities and Natural Language Processing and are linked to CLARIN’s Virtual Language Observatory.

pdf bib

Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning
Ricardo Muñoz Sánchez | David Alfter | Elena Volodina | Jelena Kallas
Proceedings of the 14th Workshop on Natural Language Processing for Computer Assisted Language Learning

pdf bib abs

daalft at SemEval-2025 Task 1: Multi-step Zero-shot Multimodal Idiomaticity Ranking
David Alfter
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

This paper presents a multi-step zero-shot system for SemEval-2025 Task 1 on Advancing Multimodal Idiomaticity Representation (AdMIRe). The system employs two state-of-the-art multimodal language models, Claude Sonnet 3.5 and OpenAI GPT-4o, to determine idiomaticity and rank images for relevance in both subtasks. A hybrid approach combining o1-preview for idiomaticity classification and GPT-4o for visual ranking produced the best overall results. The system demonstrates competitive performance on the English extended dataset for Subtask A, but faces challenges in cross-lingual transfer to Portuguese. Comparing Image+Text and Text-Only approaches reveals interesting trends and raises questions about the role of visual information in multimodal idiomaticity detection.

pdf bib abs

GRIPF at TSAR 2025 Shared Task Towards controlled CEFR level simplification with the help of inter-model interactions
David Alfter | Sebastian Gombert
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

In this contribution to the CEFR level simplification TSAR 2025 Shared Task, we propose two systems, EZ-SCALAR and SAGA, that implement two differing approaches to prompting LLMs for proficiency-adapted simplification. Our results place us in the middle of the participating teams, and reveal that using external lexical resources to guide simplification improves overall results.

2024

pdf bib

Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Elena Volodina | David Alfter | Simon Dobnik | Therese Lindström Tiedemann | Ricardo Muñoz Sánchez | Maria Irena Szawerna | Xuan-Son Vu
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)

pdf bib abs

Automatically Generated Definitions and their utility for Modeling Word Meaning
Francesco Periti | David Alfter | Nina Tahmasebi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Modeling lexical semantics is a challenging task, often suffering from interpretability pitfalls. In this paper, we delve into the generation of dictionary-like sense definitions and explore their utility for modeling word meaning. We fine-tuned two Llama models and include an existing T5-based model in our evaluation. Firstly, we evaluate the quality of the generated definitions on existing English benchmarks, setting new state-of-the-art results for the Definition Generation task. Next, we explore the use of definitions generated by our models as intermediate representations subsequently encoded as sentence embeddings. We evaluate this approach on lexical semantics tasks such as the Word-in-Context, Word Sense Induction, and Lexical Semantic Change, setting new state-of-the-art results in all three tasks when compared to unsupervised baselines.

pdf bib abs

More DWUGs: Extending and Evaluating Word Usage Graph Datasets in Multiple Languages
Dominik Schlechtweg | Pierluigi Cassotti | Bill Noble | David Alfter | Sabine Schulte Im Walde | Nina Tahmasebi
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Word Usage Graphs (WUGs) represent human semantic proximity judgments for pairs of word uses in a weighted graph, which can be clustered to infer word sense clusters from simple pairwise word use judgments, avoiding the need for word sense definitions. SemEval-2020 Task 1 provided the first and to date largest manually annotated, diachronic WUG dataset. In this paper, we check the robustness and correctness of the annotations by continuing the SemEval annotation algorithm for two more rounds and comparing against an established annotation paradigm. Further, we test the reproducibility by resampling a new, smaller set of word uses from the SemEval source corpora and annotating them. Our work contributes to a better understanding of the problems and opportunities of the WUG annotation paradigm and points to future improvements.

pdf bib abs

TCFLE-8 : un corpus de productions écrites d’apprenants de français langue étrangère et son application à la correction automatisée de textes
Rodrigo Wilkens | Alice Pintard | David Alfter | Vincent Folny | Thomas François
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

La correction automatisée de textes (CAT) vise à évaluer automatiquement la qualité de textes écrits. L’automatisation permet une évaluation à grande échelle ainsi qu’une amélioration de la cohérence, de la fiabilité et de la normalisation du processus. Ces caractéristiques sont particulièrement importantes dans le contexte des examens de certification linguistique. Cependant, un goulot d’étranglement majeur dans le développement des systèmes CAT est la disponibilité des corpus. Dans cet article, nous visons à encourager le développement de systèmes de correction automatique en fournissant le corpus TCFLE-8, un corpus de 6~569 essais collectés dans le contexte de l’examen de certification Test de Connaissance du Français (TCF). Nous décrivons la procédure d’évaluation stricte qui a conduit à la notation de chaque essai par au moins deux évaluateurs selon l’échelle du Cadre européen commun de référence pour les langues (CECR) et à la création d’un corpus équilibré. Nous faisons également progresser les performances de l’état de l’art pour la tâche de CAT en français en expérimentant deux solides modèles de référence.

pdf bib

Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change
Nina Tahmasebi | Syrielle Montariol | Andrey Kutuzov | David Alfter | Francesco Periti | Pierluigi Cassotti | Netta Huebscher
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change

pdf bib

Complexity and Indecision: A Proof-of-Concept Exploration of Lexical Complexity and Lexical Semantic Change
David Alfter
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change

pdf bib

pdf bib

Out-of-the-Box Graded Vocabulary Lists with Generative Language Models: Fact or Fiction?
David Alfter
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning

pdf bib

Jingle BERT, Frozen All the Way: Freezing Layers to Identify CEFR Levels of Second Language Learners Using BERT
Ricardo Muñoz Sánchez | David Alfter | Simon Dobnik | Maria Irena Szawerna | Elena Volodina
Proceedings of the 13th Workshop on Natural Language Processing for Computer Assisted Language Learning

2023

pdf bib abs

TCFLE-8: a Corpus of Learner Written Productions for French as a Foreign Language and its Application to Automated Essay Scoring
Rodrigo Wilkens | Alice Pintard | David Alfter | Vincent Folny | Thomas François
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Automated Essay Scoring (AES) aims to automatically assess the quality of essays. Automation enables large-scale assessment, improvements in consistency, reliability, and standardization. Those characteristics are of particular relevance in the context of language certification exams. However, a major bottleneck in the development of AES systems is the availability of corpora, which, unfortunately, are scarce, especially for languages other than English. In this paper, we aim to foster the development of AES for French by providing the TCFLE-8 corpus, a corpus of 6.5k essays collected in the context of the Test de Connaissance du Français (TCF - French Knowledge Test) certification exam. We report the strict quality procedure that led to the scoring of each essay by at least two raters according to the CEFR levels and to the creation of a balanced corpus. In addition, we describe how linguistic properties of the essays relate to the learners’ proficiency in TCFLE-8. We also advance the state-of-the-art performance for the AES task in French by experimenting with two strong baselines (i.e. RoBERTa and feature-based). Finally, we discuss the challenges of AES using TCFLE-8.

pdf bib abs

Annotation Linguistique pour l’Évaluation de la Simplification Automatique de Textes
Rémi Cardon | Adrien Bibal | Rodrigo Wilkens | David Alfter | Magali Norré | Adeline Müller | Patrick Watrin | Thomas François
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 4 : articles déjà soumis ou acceptés en conférence internationale

L’évaluation des systèmes de simplification automatique de textes (SAT) est une tâche difficile, accomplie à l’aide de métriques automatiques et du jugement humain. Cependant, d’un point de vue linguistique, savoir ce qui est concrètement évalué n’est pas clair. Nous proposons d’annoter un des corpus de référence pour la SAT, ASSET, que nous utilisons pour éclaircir cette question. En plus de la contribution que constitue la ressource annotée, nous montrons comment elle peut être utilisée pour analyser le comportement de SARI, la mesure d’évaluation la plus populaire en SAT. Nous présentons nos conclusions comme une étape pour améliorer les protocoles d’évaluation en SAT à l’avenir.

pdf bib

pdf bib

Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Thomas François | Arne Jönsson | Evelina Rennes
Proceedings of the 12th Workshop on NLP for Computer Assisted Language Learning

2022

pdf bib abs

The performance of deep learning models in NLP and other fields of machine learning has led to a rise in their popularity, and so the need for explanations of these models becomes paramount. Attention has been seen as a solution to increase performance, while providing some explanations. However, a debate has started to cast doubt on the explanatory power of attention in neural networks. Although the debate has created a vast literature thanks to contributions from various areas, the lack of communication is becoming more and more tangible. In this paper, we provide a clear overview of the insights on the debate by critically confronting works from these different areas. This holistic vision can be of great interest for future works in all the communities concerned by this debate. We sum up the main challenges spotted in these areas, and we conclude by discussing the most promising future avenues on attention as an explanation.

pdf bib abs

Evaluating automatic text simplification (ATS) systems is a difficult task that is either performed by automatic metrics or user-based evaluations. However, from a linguistic point-of-view, it is not always clear on what bases these evaluations operate. In this paper, we propose annotations of the ASSET corpus that can be used to shed more light on ATS evaluation. In addition to contributing with this resource, we show how it can be used to analyze SARI’s behavior and to re-evaluate existing ATS systems. We present our insights as a step to improve ATS evaluation protocols in the future.

pdf bib abs

L’Attention est-elle de l’Explication ? Une Introduction au Débat (Is Attention Explanation ? An Introduction to the Debate )
Adrien Bibal | Remi Cardon | David Alfter | Rodrigo Wilkens | Xiaoou Wang | Thomas François | Patrick Watrin
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Nous présentons un résumé en français et un résumé en anglais de l’article Is Attention Explanation ? An Introduction to the Debate (Bibal et al., 2022), publié dans les actes de la conférence 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022).

pdf bib abs

In this paper, we present the FABRA: readability toolkit based on the aggregation of a large number of readability predictor variables. The toolkit is implemented as a service-oriented architecture, which obviates the need for installation, and simplifies its integration into other projects. We also perform a set of experiments to show which features are most predictive on two different corpora, and how the use of aggregators improves performance over standard feature-based readability prediction. Our experiments show that, for the explored corpora, the most important predictors for native texts are measures of lexical diversity, dependency counts and text coherence, while the most important predictors for foreign texts are syntactic variables illustrating language development, as well as features linked to lexical sophistication. FABRA: have the potential to support new research on readability assessment for French.

pdf bib

pdf bib

Towards a Verb Profile: distribution of verbal tenses in FFL textbooks and in learner productions
Nami Yamaguchi | David Alfter | Kaori Sugiyama | Thomas François
Proceedings of the 11th Workshop on NLP for Computer Assisted Language Learning

pdf bib

Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference
Rodrigo Wilkens | David Alfter | Rémi Cardon | Núria Gala
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

pdf bib abs

A Dictionary-Based Study of Word Sense Difficulty
David Alfter | Rémi Cardon | Thomas François
Proceedings of the 2nd Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI) within the 13th Language Resources and Evaluation Conference

In this article, we present an exploratory study on perceived word sense difficulty by native and non-native speakers of French. We use a graded lexicon in conjunction with the French Wiktionary to generate tasks in bundles of four items. Annotators manually rate the difficulty of the word senses based on their usage in a sentence by selecting the easiest and the most difficult word sense out of four. Our results show that the native and non-native speakers largely agree when it comes to the difficulty of words. Further, the rankings derived from the manual annotation broadly follow the levels of the words in the graded resource, although these levels were not overtly available to annotators. Using clustering, we investigate whether there is a link between the complexity of a definition and the difficulty of the associated word sense. However, results were inconclusive. The annotated data set is available for research purposes.

pdf bib abs

CENTAL at TSAR-2022 Shared Task: How Does Context Impact BERT-Generated Substitutions for Lexical Simplification?
Rodrigo Wilkens | David Alfter | Rémi Cardon | Isabelle Gribomont | Adrien Bibal | Watrin Patrick | Marie-Catherine De marneffe | Thomas François
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Lexical simplification is the task of substituting a difficult word with a simpler equivalent for a target audience. This is currently commonly done by modeling lexical complexity on a continuous scale to identify simpler alternatives to difficult words. In the TSAR shared task, the organizers call for systems capable of generating substitutions in a zero-shot-task context, for English, Spanish and Portuguese. In this paper, we present the solution we (the cental team) proposed for the task. We explore the ability of BERT-like models to generate substitution words by masking the difficult word. To do so, we investigate various context enhancement strategies, that we combined into an ensemble method. We also explore different substitution ranking methods. We report on a post-submission analysis of the results and present our insights for potential improvements. The code for all our experiments is available at https://gitlab.com/Cental-FR/cental-tsar2022.

2021

pdf bib

Crowdsourcing Relative Rankings of Multi-Word Expressions: Experts versus Non-Experts
David Alfter | Therese Lindström Tiedemann | Elena Volodina
Northern European Journal of Language Technology, Volume 7

pdf bib

Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Ildikó Pilan | Johannes Graën | Lars Borin
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning

2020

pdf bib abs

Using Multilingual Resources to Evaluate CEFRLex for Learner Applications
Johannes Graën | David Alfter | Gerold Schneider
Proceedings of the Twelfth Language Resources and Evaluation Conference

The Common European Framework of Reference for Languages (CEFR) defines six levels of learner proficiency, and links them to particular communicative abilities. The CEFRLex project aims at compiling lexical resources that link single words and multi-word expressions to particular CEFR levels. The resources are thought to reflect second language learner needs as they are compiled from CEFR-graded textbooks and other learner-directed texts. In this work, we investigate the applicability of CEFRLex resources for building language learning applications. Our main concerns were that vocabulary in language learning materials might be sparse, i.e. that not all vocabulary items that belong to a particular level would also occur in materials for that level, and, on the other hand, that vocabulary items might be used on lower-level materials if required by the topic (e.g. with a simpler paraphrasing or translation). Our results indicate that the English CEFRLex resource is in accordance with external resources that we jointly employ as gold standard. Together with other values obtained from monolingual and parallel corpora, we can indicate which entries need to be adjusted to obtain values that are even more in line with this gold standard. We expect that this finding also holds for the other languages

pdf bib

Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Ildikó Pilan | Herbert Lange | Lars Borin
Proceedings of the 9th Workshop on NLP for Computer Assisted Language Learning

2019

pdf bib abs

Interconnecting lexical resources and word alignment: How do learners get on with particle verbs?
David Alfter | Johannes Graën
Proceedings of the 22nd Nordic Conference on Computational Linguistics

In this paper, we present a prototype for an online exercise aimed at learners of English and Swedish that serves multiple purposes. The exercise allows learners of the aforementioned languages to train their knowledge of particle verbs receiving clues from the exercise application. The user themselves decide which clue to receive and pay in virtual currency for each, which provides us with valuable information about the utility of the clues that we provide as well as the learners willingness to trade virtual currency versus accuracy of their choice. As resources, we use list with annotated levels from the proficiency scale defined by the Common European Framework of Reference (CEFR) and a multilingual corpus with syntactic dependency relations and word annotation for all language pairs. From the latter resource, we extract translation equivalents for particle verb construction together with a list of parallel corpus examples that can be used as clues in the exercise.

pdf bib abs

LEGATO: A flexible lexicographic annotation tool
David Alfter | Therese Lindström Tiedemann | Elena Volodina
Proceedings of the 22nd Nordic Conference on Computational Linguistics

This article is a report from an ongoing project aiming at analyzing lexical and grammatical competences of Swedish as a Second language (L2). To facilitate lexical analysis, we need access to metalinguistic information about relevant vocabulary that L2 learners can use and understand. The focus of the current article is on the lexical annotation of the vocabulary scope for a range of lexicographical aspects, such as morphological analysis, valency, types of multi-word units, etc. We perform parts of the analysis automatically, and other parts manually. The rationale behind this is that where there is no possibility to add information automatically, manual effort needs to be added. To facilitate the latter, a tool LEGATO has been designed, implemented and currently put to active testing.

pdf bib

Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning
David Alfter | Elena Volodina | Lars Borin | Ildikó Pilan | Herbert Lange
Proceedings of the 8th Workshop on NLP for Computer Assisted Language Learning

2018

pdf bib abs

Towards Single Word Lexical Complexity Prediction
David Alfter | Elena Volodina
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper we present work-in-progress where we investigate the usefulness of previously created word lists to the task of single-word lexical complexity analysis and prediction of the complexity level for learners of Swedish as a second language. The word lists used map each word to a single CEFR level, and the task consists of predicting CEFR levels for unseen words. In contrast to previous work on word-level lexical complexity, we experiment with topics as additional features and show that linking words to topics significantly increases accuracy of classification.

pdf bib abs

SB@GU at the Complex Word Identification 2018 Shared Task
David Alfter | Ildikó Pilán
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

In this paper, we describe our experiments for the Shared Task on Complex Word Identification (CWI) 2018 (Yimam et al., 2018), hosted by the 13th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at NAACL 2018. Our system for English builds on previous work for Swedish concerning the classification of words into proficiency levels. We investigate different features for English and compare their usefulness using feature selection methods. For the German, Spanish and French data we use simple systems based on character n-gram models and show that sometimes simple models achieve comparable results to fully feature-engineered systems.

pdf bib

Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning
Ildikó Pilán | Elena Volodina | David Alfter | Lars Borin
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning

2016

pdf bib abs

Coursebook Texts as a Helping Hand for Classifying Linguistic Complexity in Language Learners’ Writings
Ildikó Pilán | David Alfter | Elena Volodina
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

We bring together knowledge from two different types of language learning data, texts learners read and texts they write, to improve linguistic complexity classification in the latter. Linguistic complexity in the foreign and second language learning context can be expressed in terms of proficiency levels. We show that incorporating features capturing lexical complexity information from reading passages can boost significantly the machine learning based classification of learner-written texts into proficiency levels. With an F1 score of .8 our system rivals state-of-the-art results reported for other languages for this task. Finally, we present a freely available web-based tool for proficiency level classification and lexical complexity visualization for both learner writings and reading texts.

pdf bib

From distributions to labels: A lexical proficiency analysis using learner corpora
David Alfter | Yuri Bizzoni | Anders Agebjörn | Elena Volodina | Ildikó Pilán
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition