Pablo Gamallo

2025

pdf bib abs
Truth Knows No Language: Evaluating Truthfulness Beyond English
Blanca Calvo Figueras | Eneko Sagarzazu | Julen Etxaniz | Jeremy Barnes | Pablo Gamallo | Iria de-Dios-Flores | Rodrigo Agerri
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

We introduce a professionally translated extension of the TruthfulQA benchmark designed to evaluate truthfulness in Basque, Catalan, Galician, and Spanish. Truthfulness evaluations of large language models (LLMs) have primarily been focused on English. However, the ability of LLMs to maintain truthfulness across languages remains under-explored. Our study evaluates 12 state-of-the-art open LLMs, comparing base and instruction-tuned models using human evaluation, multiple-choice metrics, and LLM-as-a-Judge scoring. Our findings reveal that, while LLMs perform best in English and worst in Basque (the lowest-resourced language), overall truthfulness discrepancies across languages are smaller than anticipated. Furthermore, we show that LLM-as-a-Judge correlates more closely with human judgments than multiple-choice metrics, and that informativeness plays a critical role in truthfulness assessment. Our results also indicate that machine translation provides a viable approach for extending truthfulness benchmarks to additional languages, offering a scalable alternative to professional translation. Finally, we observe that universal knowledge questions are better handled across languages than context- and time-dependent ones, highlighting the need for truthfulness evaluations that account for cultural and temporal variability. Datasets, models and code are publicly available under open licenses.

pdf bib abs
Detecting Hyperpartisanship and Rhetorical Bias in Climate Journalism: A Sentence-Level Italian Dataset
Michele Joshua Maggini | Davide Bassi | Pablo Gamallo
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)

We present the first Italian dataset for joint hyperpartisan and rhetorical bias detection in climate change discourse. The dataset comprises 48 articles (1,010 sentences) from far-right media outlets, annotated at sentence level for both binary hyperpartisan classification and a fine-grained taxonomy of 17 rhetorical biases. Our annotation scheme achieves a Cohen’s kappa agreement of 0.63 on the gold test set (173 sentences), demonstrating the complexity and reliability of the task. We conduct extensive analysis revealing significant correlations between hyperpartisan content and specific rhetorical techniques, particularly in climate change, Euroscepticism, and green policy coverage. To the best of our knowledge, we are the first to tackle hyperpartisan detection related to logical fallacies. Indeed, we studied their correlation. Moreover, up to our knowledge no previous work focused on hyperpartisan at sentence level. Our experiments with state-of-the-art language models (GPT-4o-mini) and Italian BERTbase models establish strong baselines for both tasks, while highlighting the challenges in detecting subtle manipulation strategies applied with rhetorical biases. To ensure reproducibility while addressing copyright concerns, we release article URLs, article id and paragraph’s number alongside comprehensive annotation guidelines. This resource advances research in cross-lingual propaganda detection and provides insights into the rhetorical strategies employed in Italian climate change discourse. We provide the code and the dataset to reproduce our results: https://anonymous.4open.science/r/Climate_HP-RB-D5EF/README.md

The current best practice to measure the performance of base Large Language Models is to establish a multi-task benchmark that covers a range of capabilities of interest. Currently, however, such benchmarks are only available in a few high-resource languages. To address this situation, we present IberoBench, a multilingual, multi-task benchmark for Iberian languages (i.e., Basque, Catalan, Galician, European Spanish and European Portuguese) built on the LM Evaluation Harness framework. The benchmark consists of 62 tasks divided into 179 subtasks. We evaluate 33 existing LLMs on IberoBench on 0- and 5-shot settings. We also explore the issues we encounter when working with the Harness and our approach to solving them to ensure high-quality evaluation.

pdf bib abs
A Continuous Approach to Metaphorically Motivated Regular Polysemy in Language Models
Anna Temerko | Marcos Garcia | Pablo Gamallo
Proceedings of the 29th Conference on Computational Natural Language Learning

Linguistic accounts show that a word’s polysemy structure is largely governed by systematic sense alternations that form overarching patterns across the vocabulary. While psycholinguistic studies confirm the psychological validity of regularity in human language processing, in the research on large language models (LLMs) this phenomenon remains largely unaddressed. Revealing models’ sensitivity to systematic sense alternations of polysemous words can give us a better understanding of how LLMs process ambiguity and to what extent they emulate representations in the human mind. For this, we employ the measures of surprisal and semantic similarity as proxies of human judgment on the acceptability of novel senses. We focus on two aspects that have not received much attention previously —metaphorically motivated patterns and the continuous nature of regularity. We find evidence that surprisal from language models represents regularity of polysemic extensions in a human-like way, discriminating between different types of senses and varying regularity degrees, and overall strongly correlating with human acceptability scores.

pdf bib abs
Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study
Pablo Rodríguez | Silvia Paniagua Suárez | Pablo Gamallo | Susana Sotelo Docio
Findings of the Association for Computational Linguistics: ACL 2025

Recent advances in Large Language Models (LLMs) have led to remarkable improvements in language understanding and text generation. However, challenges remain in enhancing their performance for underrepresented languages, ensuring continual learning without catastrophic forgetting, and developing robust evaluation methodologies. This work addresses these issues by investigating the impact of Continued Pretraining (CPT) on multilingual models and proposing a comprehensive evaluation framework for LLMs, focusing on the case of Galician language. Our first contribution explores CPT strategies for languages with limited representation in multilingual models. We analyze how CPT with Galician corpora improves text generation while assessing the trade-offs between linguistic enrichment and task-solving capabilities. Our findings show that CPT with small, high-quality corpora and diverse instructions enhances both task performance and linguistic quality. Our second contribution is a structured evaluation framework based on distinguishing task-based and language-based assessments, leveraging existing and newly developed benchmarks for Galician. Additionally, we contribute new Galician LLMs, datasets for evaluation and instructions, and an evaluation framework.

2024

pdf bib
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Pablo Gamallo | Daniela Claro | António Teixeira | Livy Real | Marcos Garcia | Hugo Gonçalo Oliveira | Raquel Amaro
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

pdf bib
CorpusNÓS: A massive Galician corpus for training large language models
Iria de-Dios-Flores | Silvia Paniagua Suárez | Cristina Carbajal Pérez | Daniel Bardanca Outeiriño | Marcos Garcia | Pablo Gamallo
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

pdf bib
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2
Pablo Gamallo | Daniela Claro | António Teixeira | Livy Real | Marcos Garcia | Hugo Gonçalo Oliveira | Raquel Amaro
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2

In this paper, we present the two strategies employed for the WMT24 Shared Task on Translation into Low-Resource Languages of Spain. We participated in the language pairs of Spanish-to-Aragonese, Spanish-to-Aranese, and Spanish-to-Asturian, developing neural-based translation systems and moving away from rule-based approaches for these language directions. To create these models, two distinct strategies were employed. The first strategy involved a thorough cleaning process and curation of the limited provided data, followed by fine-tuning the multilingual NLLB-200-600M model (Constrained Submission). The other strategy involved training a transformer from scratch using a vast amount of synthetic data (Open Submission). Both approaches relied on generated synthetic data and resulted in high ChrF and BLEU scores. However, given the characteristics of the task, the strategy used in the Constrained Submission resulted in higher scores that surpassed the baselines across the three translation directions, whereas the strategy employed in the Open Submission yielded slightly lower scores than the highest baseline.

2022

The development of language technologies (LTs) such as machine translation, text analytics, and dialogue systems is essential in the current digital society, culture and economy. These LTs, widely supported in languages in high demand worldwide, such as English, are also necessary for smaller and less economically powerful languages, as they are a driving force in the democratization of the communities that use them due to their great social and cultural impact. As an example, dialogue systems allow us to communicate with machines in our own language; machine translation increases access to contents in different languages, thus facilitating intercultural relations; and text-to-speech and speech-to-text systems broaden different categories of users’ access to technology. In the case of Galician (co-official language, together with Spanish, in the autonomous region of Galicia, located in northwestern Spain), incorporating the language into state-of-the-art AI applications can not only significantly favor its prestige (a decisive factor in language normalization), but also guarantee citizens’ language rights, reduce social inequality, and narrow the digital divide. This is the main motivation behind the Nós Project (Proxecto Nós), which aims to have a significant contribution to the development of LTs in Galician (currently considered a low-resource language) by providing openly licensed resources, tools, and demonstrators in the area of intelligent technologies.

2020

pdf bib abs
CitiusNLP at SemEval-2020 Task 3: Comparing Two Approaches for Word Vector Contextualization
Pablo Gamallo
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This article describes some unsupervised strategies submitted to SemEval 2020 Task 3, a task which consists of considering the effect of context to compute word similarity. More precisely, given two words in context, the system must predict the degree of similarity of those words considering the context in which they occur, and the system score is compared against human prediction. We compare one approach based on pre-trained BERT models with other strategy relying on static word embeddings and syntactic dependencies. The BERT-based method clearly outperformed the dependency-based strategy.

2019

pdf bib abs
Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual Corpora
Pablo Gamallo | Susana Sotelo | José Ramom Pichel | Mikel Artetxe
Computational Linguistics, Volume 45, Issue 3 - September 2019

This article describes a compositional distributional method to generate contextualized senses of words and identify their appropriate translations in the target language using monolingual corpora. Word translation is modeled in the same way as contextualization of word meaning, but in a bilingual vector space. The contextualization of meaning is carried out by means of distributional composition within a structured vector space with syntactic dependencies, and the bilingual space is created by means of transfer rules and a bilingual dictionary. A phrase in the source language, consisting of a head and a dependent, is translated into the target language by selecting both the nearest neighbor of the head given the dependent, and the nearest neighbor of the dependent given the head. This process is expanded to larger phrases by means of incremental composition. Experiments were performed on English and Spanish monolingual corpora in order to translate phrasal verbs in context. A new bilingual data set to evaluate strategies aimed at translating phrasal verbs in restricted syntactic domains has been created and released.

pdf bib abs
CiTIUS-COLE at SemEval-2019 Task 5: Combining Linguistic Features to Identify Hate Speech Against Immigrants and Women on Multilingual Tweets
Sattam Almatarneh | Pablo Gamallo | Francisco J. Ribadas Pena
Proceedings of the 13th International Workshop on Semantic Evaluation

This article describes the strategy submitted by the CiTIUS-COLE team to SemEval 2019 Task 5, a task which consists of binary classi- fication where the system predicting whether a tweet in English or in Spanish is hateful against women or immigrants or not. The proposed strategy relies on combining linguis- tic features to improve the classifier’s perfor- mance. More precisely, the method combines textual and lexical features, embedding words with the bag of words in Term Frequency- Inverse Document Frequency (TF-IDF) repre- sentation. The system performance reaches about 81% F1 when it is applied to the training dataset, but its F1 drops to 36% on the official test dataset for the English and 64% for the Spanish language concerning the hate speech class

pdf bib abs
Unsupervised Compositional Translation of Multiword Expressions
Pablo Gamallo | Marcos Garcia
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

This article describes a dependency-based strategy that uses compositional distributional semantics and cross-lingual word embeddings to translate multiword expressions (MWEs). Our unsupervised approach performs translation as a process of word contextualization by taking into account lexico-syntactic contexts and selectional preferences. This strategy is suited to translate phraseological combinations and phrases whose constituent words are lexically restricted by each other. Several experiments in adjective-noun and verb-object compounds show that mutual contextualization (co-compositionality) clearly outperforms other compositional methods. The paper also contributes with a new freely available dataset of English-Spanish MWEs used to validate the proposed compositional strategy.

2018

pdf bib abs
CitiusNLP at SemEval-2018 Task 10: The Use of Transparent Distributional Models and Salient Contexts to Discriminate Word Attributes
Pablo Gamallo
Proceedings of the 12th International Workshop on Semantic Evaluation

This article describes the unsupervised strategy submitted by the CitiusNLP team to the SemEval 2018 Task 10, a task which consists of predict whether a word is a discriminative attribute between two other words. Our strategy relies on the correspondence between discriminative attributes and relevant contexts of a word. More precisely, the method uses transparent distributional models to extract salient contexts of words which are identified as discriminative attributes. The system performance reaches about 70% accuracy when it is applied on the development dataset, but its accuracy goes down (63%) on the official test dataset.

pdf bib abs
Measuring language distance among historical varieties using perplexity. Application to European Portuguese.
Jose Ramom Pichel Campos | Pablo Gamallo | Iñaki Alegria
Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018)

The objective of this work is to quantify, with a simple and robust measure, the distance between historical varieties of a language. The measure will be inferred from text corpora corresponding to historical periods. Different approaches have been proposed for similar aims: Language Identification, Phylogenetics, Historical Linguistics or Dialectology. In our approach, we used a perplexity-based measure to calculate language distance between all the historical periods of a specific language: European Portuguese. Perplexity has also proven to be a robust metric to calculate distance between languages. However, this measure has not been tested yet to identify diachronic periods within the historical evolution of a specific language. For this purpose, a historical Portuguese corpus has been constructed from different open sources containing texts with close original spelling. The results of our experiments show that Portuguese keeps an important degree of homogeneity over time. We anticipate this metric to be a starting point to be applied to other languages.

2017

pdf bib abs
A Web Interface for Diachronic Semantic Search in Spanish
Pablo Gamallo | Iván Rodríguez-Torres | Marcos Garcia
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

This article describes a semantic system which is based on distributional models obtained from a chronologically structured language resource, namely Google Books Syntactic Ngrams. The models were created using dependency-based contexts and a strategy for reducing the vector space, which consists in selecting the more informative and relevant word contexts. The system allowslinguists to analize meaning change of Spanish words in the written language across time.

pdf bib abs
A rule-based system for cross-lingual parsing of Romance languages with Universal Dependencies
Marcos Garcia | Pablo Gamallo
Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

This article describes MetaRomance, a rule-based cross-lingual parser for Romance languages submitted to CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. The system is an almost delexicalized parser which does not need training data to analyze Romance languages. It contains linguistically motivated rules based on PoS-tag patterns. The rules included in MetaRomance were developed in about 12 hours by one expert with no prior knowledge in Universal Dependencies, and can be easily extended using a transparent formalism. In this paper we compare the performance of MetaRomance with other supervised systems participating in the competition, paying special attention to the parsing of different treebanks of the same language. We also compare our system with a delexicalized parser for Romance languages, and take advantage of the harmonized annotation of Universal Dependencies to propose a language ranking based on the syntactic distance each variety has from Romance languages.

pdf bib abs
Citius at SemEval-2017 Task 2: Cross-Lingual Similarity from Comparable Corpora and Dependency-Based Contexts
Pablo Gamallo
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This article describes the distributional strategy submitted by the Citius team to the SemEval 2017 Task 2. Even though the team participated in two subtasks, namely monolingual and cross-lingual word similarity, the article is mainly focused on the cross-lingual subtask. Our method uses comparable corpora and syntactic dependencies to extract count-based and transparent bilingual distributional contexts. The evaluation of the results show that our method is competitive with other cross-lingual strategies, even those using aligned and parallel texts.

pdf bib abs
A Perplexity-Based Method for Similar Languages Discrimination
Pablo Gamallo | Jose Ramom Pichel | Iñaki Alegria
Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial)

This article describes the system submitted by the Citius_Ixa_Imaxin team to the VarDial 2017 (DSL and GDI tasks). The strategy underlying our system is based on a language distance computed by means of model perplexity. The best model configuration we have tested is a voting system making use of several n-grams models of both words and characters, even if word unigrams turned out to be a very competitive model with reasonable results in the tasks we have participated. An error analysis has been performed in which we identified many test examples with no linguistic evidences to distinguish among the variants.

pdf bib abs
Compositional Semantics using Feature-Based Models from WordNet
Pablo Gamallo | Martín Pereira-Fariña
Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and their Applications

This article describes a method to build semantic representations of composite expressions in a compositional way by using WordNet relations to represent the meaning of words. The meaning of a target word is modelled as a vector in which its semantically related words are assigned weights according to both the type of the relationship and the distance to the target word. Word vectors are compositionally combined by syntactic dependencies. Each syntactic dependency triggers two complementary compositional functions: the named head function and dependent function. The experiments show that the proposed compositional method outperforms the state-of-the-art for both intransitive subject-verb and transitive subject-verb-object constructions.

pdf bib abs
Sense Contextualization in a Dependency-Based Compositional Distributional Model
Pablo Gamallo
Proceedings of the 2nd Workshop on Representation Learning for NLP

Little attention has been paid to distributional compositional methods which employ syntactically structured vector models. As word vectors belonging to different syntactic categories have incompatible syntactic distributions, no trivial compositional operation can be applied to combine them into a new compositional vector. In this article, we generalize the method described by Erk and Padó (2009) by proposing a dependency-base framework that contextualize not only lemmas but also selectional preferences. The main contribution of the article is to expand their model to a fully compositional framework in which syntactic dependencies are put at the core of semantic composition. We claim that semantic composition is mainly driven by syntactic dependencies. Each syntactic dependency generates two new compositional vectors representing the contextualized sense of the two related lemmas. The sequential application of the compositional operations associated to the dependencies results in as many contextualized vectors as lemmas the composite expression contains. At the end of the semantic process, we do not obtain a single compositional vector representing the semantic denotation of the whole composite expression, but one contextualized vector for each lemma of the whole expression. Our method avoids the troublesome high-order tensor representations by defining lemmas and selectional restrictions as first-order tensors (i.e. standard vectors). A corpus-based experiment is performed to both evaluate the quality of the compositional vectors built with our strategy, and to compare them to other approaches on distributional compositional semantics. The experiments show that our dependency-based compositional method performs as (or even better than) the state-of-the-art.

2016

We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.

pdf bib abs
Comparing Two Basic Methods for Discriminating Between Similar Languages and Varieties
Pablo Gamallo | Iñaki Alegria | José Ramom Pichel | Manex Agirrezabal
Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial3)

This article describes the systems submitted by the Citius_Ixa_Imaxin team to the Discriminating Similar Languages Shared Task 2016. The systems are based on two different strategies: classification with ranked dictionaries and Naive Bayes classifiers. The results of the evaluation show that ranking dictionaries are more sound and stable across different domains while basic bayesian models perform reasonably well on in-domain datasets, but their performance drops when they are applied on out-of-domain texts.

2015

pdf bib
Dependency Parsing with Compression Rules
Pablo Gamallo
Proceedings of the 14th International Conference on Parsing Technologies

2014

pdf bib
An Entity-Centric Coreference Resolution System for Person Entities with Rich Linguistic Information
Marcos Garcia | Pablo Gamallo
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.

pdf bib abs
Multilingual corpora with coreferential annotation of person entities
Marcos Garcia | Pablo Gamallo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents three corpora with coreferential annotation of person entities for Portuguese, Galician and Spanish. They contain coreference links between several types of pronouns (including elliptical, possessive, indefinite, demonstrative, relative and personal clitic and non-clitic pronouns) and nominal phrases (including proper nouns). Some statistics have been computed, showing distributional aspects of coreference both in journalistic and in encyclopedic texts. Furthermore, the paper shows the importance of coreference resolution for a task such as Information Extraction, by evaluating the output of an Open Information Extraction system on the annotated corpora. The corpora are freely distributed in two formats: (i) the SemEval-2010 and (ii) the brat rapid annotation tool, so they can be enlarged and improved collaboratively.

pdf bib
Citius: A Naive-Bayes Strategy for Sentiment Analysis on English Tweets
Pablo Gamallo | Marcos Garcia
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

This paper describes a specific semantic property underlying binary dependencies: co-composition. We propose a more general definition than that given by Pustejovsky, what we call “optional co-composition”. The aim of the paper is to explore the benefits of optional cocomposition in two disambiguation tasks: both word sense and structural disambiguation. Concerning the second task, some experiments were performed on large corpora.