Tsegaye Misikir Tashu

Also published as: Tsegaye Misikir Tashu

2026

Cross-Lingual Emotion Recognition in Balinese Text using Multilingual-LLMs under Peer-Collaborations Settings
Putu Kussa Laksana Utama | Jilles Steeve Dibangoye | Tsegaye Misikir Tashu
Proceedings of the Second Workshop on Language Models for Low-Resource Languages (LoResLM 2026)

Cross-Lingual Emotion Recognition (CLER) remains a formidable challenge for ultra-low-resource languages like Balinese due to the scarcity of high-quality annotated data and the performance limitations of traditional multilingual models. This study addresses these gaps through two primary contributions. First, we present a newly created multi-label Balinese emotion dataset annotated by a panel of experts in Balinese linguistics and psychology. Second, we propose the Multi-Agent Peer Collaboration (MAPC) framework, which transforms the multi-label classification problem into a series of independent binary tasks to leverage the collaborative reasoning of Large Language Models (LLMs). We evaluated the framework against the LaBSE multilingual model and three LLMs of varying scales under zero-shot and few-shot settings using the Macro-F1 measure. The experimental results showed that LLMs significantly outperform traditional Pre-trained Language Models (PLMs). MAPC achieved an overall macro F₁-score of 63.95, which was higher than the individual baselines in both zero-shot and few-shot settings. Analysis shows that while some models exhibit sensitivity to few-shot prompting in low-resource contexts, the MAPC review and revision process consistently improves individual reasoning and provides a more accurate final classification.

2025

pdf bib abs

Confidence Calibration in Large Language Model-Based Entity Matching
Iris Kamsteeg | Juan Cardenas-Cartagena | Floris van Beers | Tsegaye Misikir Tashu | Matias Valdenegro-Toro
Proceedings of the 2nd Workshop on Uncertainty-Aware NLP (UncertaiNLP 2025)

This research aims to explore the intersection of Large Language Models and confidence calibration in Entity Matching. To this end, we perform an empirical study to compare baseline RoBERTa confidences for an Entity Matching task against confidences that are calibrated using Temperature Scaling, Monte Carlo Dropout and Ensembles. We use the Abt-Buy, DBLP-ACM, iTunes-Amazon and Company datasets. The findings indicate that the proposed modified RoBERTa model exhibits a slight overconfidence, with Expected Calibration Error scores ranging from 0.0043 to 0.0552 across datasets. We find that this overconfidence can be mitigated using Temperature Scaling, reducing Expected Calibration Error scores by up to 23.83%.

pdf bib abs

Cross-Lingual Document Recommendations with Transformer-Based Representations: Evaluating Multilingual Models and Mapping Techniques
Tsegaye Misikir Tashu | Eduard-Raul Kontos | Matthia Sabatelli | Matias Valdenegro-Toro
Proceedings of the Second Workshop on Scaling Up Multilingual & Multi-Cultural Evaluation

Recommendation systems, for documents, have become tools for finding relevant content on the Web. However, these systems have limitations when it comes to recommending documents in languages different from the query language, which means they might overlook resources in non-native languages. This research focuses on representing documents across languages by using Transformer Leveraged Document Representations (TLDRs) that are mapped to a cross-lingual domain. Four multilingual pre-trained transformer models (mBERT, mT5 XLM RoBERTa, ErnieM) were evaluated using three mapping methods across 20 language pairs representing combinations of five selected languages of the European Union. Metrics like Mate Retrieval Rate and Reciprocal Rank were used to measure the effectiveness of mapped TLDRs compared to non-mapped ones. The results highlight the power of cross-lingual representations achieved through pre-trained transformers and mapping approaches suggesting a promising direction for expanding beyond language connections, between two specific languages.

pdf bib abs

Mapping Cross-Lingual Sentence Representations for Low-Resource Language Pairs Using Pre-trained Language Models
Andreea Ioana Tudor | Tsegaye Misikir Tashu
Proceedings of the First Workshop on Language Models for Low-Resource Languages

In this work, we explore different linear mapping techniques to learn cross-lingual document representations from pre-trained multilingual large language models for low-resource languages. Three different mapping techniques namely Linear Concept Approximation (LCA), Linear Concept Compression (LCC), and Neural Concept Approximation (NCA) and four multilingual language models such as mBERT, mT5, XLM-R, and ErnieM were used to extract embeddings. The inter-lingual representations were created mappings the monolingual representation extracted from multilingual language models. The experimental results showed that LCA and LCC significantly outperform NCA, with models like ErnieM achieving the highest alignment quality. Language pairs exhibit variable performance, influenced by linguistic similarity and data availability, with the Amharic-English pair yielding particularly high scores. The results showed the utility of LCA and LCC in enabling cross-lingual tasks for low-resource languages.

pdf bib abs

Difficulty Estimation in Natural Language Tasks with Action Scores
Aleksandar Angelov | Tsegaye Misikir Tashu | Matias Valdenegro-Toro
Proceedings of the 5th Workshop on Trustworthy NLP (TrustNLP 2025)

This study investigates the effectiveness of the action score, a metric originally developed for computer vision tasks, in estimating sample difficulty across various natural language processing (NLP) tasks. Using transformer-based models, the action score is applied to sentiment analysis, natural language inference, and abstractive text summarization. The results demonstrate that the action score can effectively identify challenging samples in sentiment analysis and natural language inference, often capturing difficult instances that are missed by more established metrics like entropy. However, the effectiveness of the action score appears to be task-dependent, as evidenced by its performance in the abstractive text summarization task, where it exhibits a nearly linear relationship with entropy. The findings suggest that the action score can provide valuable insights into the characteristics of challenging samples in NLP tasks, particularly in classification settings. However, its application should be carefully considered in the context of each specific task and in light of emerging research on the potential value of hard samples in machine learning.

Co-authors

Eduard-Raul Kontos 1

Matthia Sabatelli 1

Andreea Ioana Tudor 1

Putu Kussa Laksana Utama 1

Floris van Beers 1

Venues

Fix author