Shakib Yazdani
2026
A Comprehensive Evaluation of Chain-of-Thought Faithfulness in Persian Classification Tasks
Shakib Yazdani | Cristina España-Bonet | Eleftherios Avramidis | Yasser Hamidullah | Josef Van Genabith
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Shakib Yazdani | Cristina España-Bonet | Eleftherios Avramidis | Yasser Hamidullah | Josef Van Genabith
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)
Large language models (LLMs) have shown remarkable performance when prompted to reason step by step, commonly referred to as chain-of-thought (CoT) reasoning. While prior work has proposed mechanism-level approaches to evaluate CoT faithfulness, these studies have primarily focused on English, leaving low-resource languages such as Persian largely underexplored. In this paper, we present the first comprehensive study of CoT faithfulness in Persian. Our analysis spans 15 classification datasets and 6 language models across three classes (small, large, and reasoning models) evaluated under both English and Persian prompting conditions. We first assess model performance on each dataset while collecting the corresponding CoT traces and final predictions. We then evaluate the faithfulness of these CoT traces using an LLM-as-a-judge approach, followed by a human evaluation to measure agreement between the LLM-based judge and human annotator. Our results reveal substantial variation in CoT faithfulness across tasks, datasets, and model classes. In particular, faithfulness is strongly influenced by the dataset and the language model class, while the language used for prompting has a comparatively smaller effect. Notably, small language models exhibit lower or comparable faithfulness scores than large language models and reasoning models.
2025
Continual Learning in Multilingual Sign Language Translation
Shakib Yazdani | Josef Van Genabith | Cristina España-Bonet
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Shakib Yazdani | Josef Van Genabith | Cristina España-Bonet
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
The field of sign language translation (SLT) is still in its infancy, as evidenced by the low translation quality, even when using deep learn- ing approaches. Probably because of this, many common approaches in other machine learning fields have not been explored in sign language. Here, we focus on continual learning for mul- tilingual SLT. We experiment with three con- tinual learning methods and compare them to four more naive baseline and fine-tuning ap- proaches. We work with four sign languages (ASL, BSL, CSL and DGS) and three spo- ken languages (Chinese, English and German). Our results show that incremental fine-tuning is the best performing approach both in terms of translation quality and transfer capabilities, and that continual learning approaches are not yet fully competitive given the current SOTA in SLT.
Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media
Shakib Yazdani | Yasser Hamidullah | Cristina España-Bonet | Josef van Genabith
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Shakib Yazdani | Yasser Hamidullah | Cristina España-Bonet | Josef van Genabith
Proceedings of the 15th International Conference on Recent Advances in Natural Language Processing - Natural Language Processing in the Generative AI Era
Most existing sign language translation (SLT) datasets are limited in scale, lack multilingual coverage, and are costly to curate due to their reliance on expert annotation and controlled recording setup. Recently, Vision Language Models (VLMs) have demonstrated strong capabilities as evaluators and real-time assistants. Despite these advancements, their potential remains untapped in the context of sign language dataset acquisition. To bridge this gap, we introduce the first automated annotation and filtering framework that utilizes VLMs to reduce reliance on manual effort while preserving data quality. Our method is applied to TikTok videos across eight sign languages and to the already curated YouTube-SL-25 dataset in German Sign Language for the purpose of additional evaluation. Our VLM-based pipeline includes a face visibility detection, a sign activity recognition, a text extraction from video content, and a judgment step to validate alignment between video and text, implementing generic filtering, annotation and validation steps. Using the resulting corpus, TikTok-SL-8, we assess the performance of two off-the-shelf SLT models on our filtered dataset for German and American Sign Languages, with the goal of establishing baselines and evaluating the robustness of recent models on automatically extracted, slightly noisy data. Our work enables scalable, weakly supervised pretraining for SLT and facilitates data acquisition from social media.
SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
Yasser Hamidullah | Shakib Yazdani | Cennet Oguz | Josef Van Genabith | Cristina España-Bonet
Proceedings of the Tenth Conference on Machine Translation
Yasser Hamidullah | Shakib Yazdani | Cennet Oguz | Josef Van Genabith | Cristina España-Bonet
Proceedings of the Tenth Conference on Machine Translation
Sign language translation (SLT) is typically trained with text in a single spoken language, which limits scalability and cross-language generalization. Earlier approaches have replaced gloss supervision with text-based sentence embeddings, but up to now, these remain tied to a specific language and modality. In contrast, here we employ language-agnostic, multimodal embeddings trained on text and speech from multiple languages to supervise SLT, enabling direct multilingual translation. To address data scarcity, we propose a coupled augmentation method that combines multilingual target augmentations (i.e. translations into many languages) with video-level perturbations, improving model robustness. Experiments show consistent BLEURT gains over text-only sentence embedding supervision, with larger improvements in low-resource settings. Our results demonstrate that language-agnostic embedding supervision, combined with coupled augmentation, provides a scalable and semantically robust alternative to traditional SLT training.