Hiram Calvo


2025

Immediate Textual Variation (ITV) is defined as the process of introducing changes during text transmission from one node to another. One-step variation can be useful for testing specific philological hypotheses. In this paper, we propose using Large Language Models (LLMs) as text-modifying agents. We analyze three scenarios: (1) simple variations (omissions), (2) paraphrasing, and (3) paraphrasing with bias injection (polarity). We generate simulated news items using a predefined scheme. We hypothesize that central tendency measures—such as the mean and median vectors in the feature space of sentence transformers—can effectively approximate the original text representation. Our findings indicate that the median vector is a more accurate estimator of the original vector than most alternatives. However, in cases involving substantial rephrasing, the agent that produces the least semantic drift provides the best estimation, aligning with the principles of Bédierian textual criticism.
This paper presents a multi-step approach for multi-label emotion classification as our system description paper for the SEMEVAL-2025 workshop Task A using machine learning and deep learning models. We test our methodology on English, Spanish, and low-resource Yoruba datasets, with each dataset labeled with five emotion categories: anger, fear, joy, sadness, and surprise. Our preprocessing involves text cleaning and feature extraction using bigrams and TF-IDF. We employ logistic regression for baseline classification and fine-tune Transformer models, such as BERT and XLM-RoBERTa, for improved performance. The Transformer-based models outperformed the logistic regression model, achieving micro-F1 scores of 0.7061, 0.7321, and 0.2825 for English, Spanish, and Yoruba, respectively. Notably, our Yoruba fine-tuned model outperformed the baseline model of the task organizers with micro-F1 score of 0.092, demonstrating the effectiveness of Transformer models in handling emotion classification tasks across diverse languages.

2023

In this paper, we share our best performing submission to the Arabic AI Tasks Evaluation Challenge (ArAIEval) at ArabicNLP 2023. Our focus was on Task 1, which involves identifying persuasion techniques in excerpts from tweets and news articles. The persuasion technique in Arabic texts was detected using a training loop with XLM-RoBERTa, a language-agnostic text representation model. This approach proved to be potent, leveraging fine-tuning of a multilingual language model. In our evaluation of the test set, we achieved a micro F1 score of 0.64 for subtask A of the competition.

2018

2016

2014

2009