Aitor Álvarez

Also published as: Aitor Arronte Álvarez, Aitor Arronte Alvarez

2025

Automated L2 Proficiency Scoring: Weak Supervision, Large Language Models, and Statistical Guarantees
Aitor Arronte Alvarez | Naiyi Xie Fincham
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

Weakly supervised learning (WSL) is a machine learning approach used when labeled data is scarce or expensive to obtain. In such scenarios, models are trained using weaker supervision sources instead of human-annotated data. However, these sources are often noisy and may introduce unquantified biases during training. This issue is particularly pronounced in automated scoring (AS) of second language (L2) learner output, where high variability and limited generalizability pose significant challenges.In this paper, we investigate analytical scoring of L2 learner responses under weak and semi-supervised learning conditions, leveraging Prediction-Powered Inference (PPI) to provide statistical guarantees on score validity. We compare two approaches: (1) synthetic scoring using large language models (LLMs), and (2) a semi-supervised setting in which a machine learning model, trained on a small gold-standard set, generates predictions for a larger unlabeled corpus. In both cases, PPI is applied to construct valid confidence intervals for assessing the reliability of the predicted scores.Our analysis, based on a dataset of L2 learner conversations with an AI agent, shows that PPI is highly informative for evaluating the quality of weakly annotated data. Moreover, we demonstrate that PPI can increase the effective sample size by over 150% relative to the original human-scored subset, enabling more robust inference in educational assessment settings where labeled data is scarce.

pdf bib abs

This demo paper presents a prototype of a multilingual, speech-based driver assistant, designed to support both English and Basque languages. The inclusion of Basque—a low-resource language with limited domain-specific training data—marks a significant contribution, as publicly available AI models, including Large Language Models, often underperform for such languages compared to high-resource languages like English. Despite these challenges, our system demonstrates robust performance, successfully understanding user queries and delivering rapid responses in a demanding environment: a car simulator. Notably, the system achieves comparable performance in both English and Basque, showcasing its effectiveness in addressing linguistic disparities in AI-driven applications. A demo of our prototype will be available in the workshop.

2024

pdf bib

Real-Time Speech-Driven Avatar Animation by Predicting Facial landmarks and Deformation Blendshapes
Juan Camilo Vasquez-Correa | Santiago Moreno-Acevedo | Ander Gonzalez-Docasal | Aritz Lasarguren | Jone Lòpez | Egoitz Rodriguez | Aitor Álvarez
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)

pdf bib

2016

pdf bib abs

Impact of Automatic Segmentation on the Quality, Productivity and Self-reported Post-editing Effort of Intralingual Subtitles
Aitor Álvarez | Marina Balenciaga | Arantza del Pozo | Haritz Arzelus | Anna Matamala | Carlos-D. Martínez-Hinarejos
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper describes the evaluation methodology followed to measure the impact of using a machine learning algorithm to automatically segment intralingual subtitles. The segmentation quality, productivity and self-reported post-editing effort achieved with such approach are shown to improve those obtained by the technique based in counting characters, mainly employed for automatic subtitle segmentation currently. The corpus used to train and test the proposed automated segmentation method is also described and shared with the community, in order to foster further research in this area.

2015

pdf bib

The reception of intralingual and interlingual automatic subtitling: An exploratory study within the HBB4ALL project
Anna Matamala | Andreu Oliver | Aitor Álvarez | Andoni Azpeitia
Proceedings of Translating and the Computer 37

pdf bib

Enriching Digitized Medieval Manuscripts: Linking Image, Text and Lexical Knowledge
Aitor Arronte Álvarez
Proceedings of the 9th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

2014

pdf bib abs

This paper describes the data collection, annotation and sharing activities carried out within the FP7 EU-funded SAVAS project. The project aims to collect, share and reuse audiovisual language resources from broadcasters and subtitling companies to develop large vocabulary continuous speech recognisers in specific domains and new languages, with the purpose of solving the automated subtitling needs of the media industry.

pdf bib abs

Phoneme Similarity Matrices to Improve Long Audio Alignment for Automatic Subtitling
Pablo Ruiz | Aitor Álvarez | Haritz Arzelus
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Long audio alignment systems for Spanish and English are presented, within an automatic subtitling application. Language-specific phone decoders automatically recognize audio contents at phoneme level. At the same time, language-dependent grapheme-to-phoneme modules perform a transcription of the script for the audio. A dynamic programming algorithm (Hirschberg’s algorithm) finds matches between the phonemes automatically recognized by the phone decoder and the phonemes in the scripts transcription. Alignment accuracy is evaluated when scoring alignment operations with a baseline binary matrix, and when scoring alignment operations with several continuous-score matrices, based on phoneme similarity as assessed through comparing multivalued phonological features. Alignment accuracy results are reported at phoneme, word and subtitle level. Alignment accuracy when using the continuous scoring matrices based on phonological similarity was clearly higher than when using the baseline binary matrix.

Aitor Álvarez

2025

2024

2016

2015

2014

Co-authors

Venues