2023
pdf
bib
abs
UMUTeam at SemEval-2023 Task 12: Ensemble Learning of LLMs applied to Sentiment Analysis for Low-resource African Languages
José Antonio García-Díaz
|
Camilo Caparros-laiz
|
Ángela Almela
|
Gema Alcaráz-Mármol
|
María José Marín-Pérez
|
Rafael Valencia-García
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
These working notes summarize the participation of the UMUTeam in the SemEval 2023 shared task: AfriSenti, focused on Sentiment Analysis in several African languages. Two subtasks are proposed, one in which each language is considered separately and another one in which all languages are merged. Our proposal to solve both subtasks is grounded on the combination of features extracted from several multilingual Large Language Models and a subset of language-independent linguistic features. Our best results are achieved with the African languages less represented in the training set: Xitsonga, a Mozambique dialect, with a weighted f1-score of 54.89\%; Algerian Arabic, with a weighted f1-score of 68.52\%; Swahili, with a weighted f1-score of 60.52\%; and Twi, with a weighted f1-score of 71.14%.
2022
pdf
bib
abs
UMUTextStats: A linguistic feature extraction tool for Spanish
José Antonio García-Díaz
|
Pedro José Vivancos-Vicente
|
Ángela Almela
|
Rafael Valencia-García
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Feature Engineering consists in the application of domain knowledge to select and transform relevant features to build efficient machine learning models. In the Natural Language Processing field, the state of the art concerning automatic document classification tasks relies on word and sentence embeddings built upon deep learning models based on transformers that have outperformed the competition in several tasks. However, the models built from these embeddings are usually difficult to interpret. On the contrary, linguistic features are easy to understand, they result in simpler models, and they usually achieve encouraging results. Moreover, both linguistic features and embeddings can be combined with different strategies which result in more reliable machine-learning models. The de facto tool for extracting linguistic features in Spanish is LIWC. However, this software does not consider specific linguistic phenomena of Spanish such as grammatical gender and lacks certain verb tenses. In order to solve these drawbacks, we have developed UMUTextStats, a linguistic extraction tool designed from scratch for Spanish. Furthermore, this tool has been validated to conduct different experiments in areas such as infodemiology, hate-speech detection, author profiling, authorship verification, humour or irony detection, among others. The results indicate that the combination of linguistic features and embeddings based on transformers are beneficial in automatic document classification.
2012
pdf
bib
Seeing through Deception: A Computational Approach to Deceit Detection in Written Communication
Ángela Almela
|
Rafael Valencia-García
|
Pascual Cantos
Proceedings of the Workshop on Computational Approaches to Deception Detection