Marcelo Mendoza

2025

LogitRouter: a novel Attention variant for reducing Myopic Routing in Mixture of Experts
Felipe Rodriguez | Marcelo Mendoza
Proceedings of the 18th International Natural Language Generation Conference

Mixture of Experts (MoEs) have emerged as strong alternatives to traditional transformers, offering significant advantages in terms of training and inference efficiency. At the core of this architecture lies the router, responsible for selecting which experts are activated for each token. However, despite these advances, routing mechanisms continue to face stability challenges that the basic architecture fails to fully address. One such issue is Myopic Routing, where each token determines its route independently, without considering the routing decisions made for other tokens. To address this limitation, the LogitAttention mechanism is introduced—a variant of traditional attention—and, building upon it, the LogitRouter, a novel routing architecture that incorporates contextual information about the routing of other tokens. Due to budget constraints, a set of simple experiments is designed to obtain preliminary evidence of performance trends. These experiments are empirically validated on established benchmarks such as BoolQ, MMLU, and ARC. Finally, the work concludes with an in-depth discussion of architectural variants, applicability, limitations, and future directions, which aims to support continued research in this area.

pdf bib abs

Hate speech detection is vital for creating safe online environments, as harmful content can drive social polarization. This study explores the impact of enriching text with intent and group tags on machine performance and human moderation workflows. For machine performance, we enriched text with intent and group tags to train hate speech classifiers. Intent tags were the most effective, achieving state-of-the-art F1-score improvements on the IHC, SBIC, and DH datasets, respectively. Cross-dataset evaluations further demonstrated the superior generalization of intent-tagged models compared to other pre-trained approaches. Then, through a user study (N=100), we evaluated seven moderation settings, including intent tags, group tags, model probabilities, and randomized counterparts. Intent annotations significantly improved the accuracy of the moderators, allowing them to outperform machine classifiers by 12.9%. Moderators also rated intent tags as the most useful explanation tool, with a 41% increase in perceived helpfulness over the control group. Our findings demonstrate that intent-based annotations enhance both machine classification performance and human moderation workflows.

2022

pdf bib abs

Due to the success of pre-trained language models, versions of languages other than English have been released in recent years. This fact implies the need for resources to evaluate these models. In the case of Spanish, there are few ways to systematically assess the models’ quality. In this paper, we narrow the gap by building two evaluation benchmarks. Inspired by previous work (Conneau and Kiela, 2018; Chen et al., 2019), we introduce Spanish SentEval and Spanish DiscoEval, aiming to assess the capabilities of stand-alone and discourse-aware sentence representations, respectively. Our benchmarks include considerable pre-existing and newly constructed datasets that address different tasks from various domains. In addition, we evaluate and analyze the most recent pre-trained Spanish language models to exhibit their capabilities and limitations. As an example, we discover that for the case of discourse evaluation tasks, mBERT, a language model trained on multiple languages, usually provides a richer latent representation than models trained only with documents in Spanish. We hope our contribution will motivate a fairer, more comparable, and less cumbersome way to evaluate future Spanish language models.

2021

pdf bib

Inspecting the concept knowledge graph encoded by modern language models
Carlos Aspillaga | Marcelo Mendoza | Alvaro Soto
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib abs

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations
Vladimir Araujo | Andrés Villa | Marcelo Mendoza | Marie-Francine Moens | Alvaro Soto
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Current language models are usually trained using a self-supervised scheme, where the main focus is learning representations at the word or sentence level. However, there has been limited progress in generating useful discourse-level representations. In this work, we propose to use ideas from predictive coding theory to augment BERT-style language models with a mechanism that allows them to learn suitable discourse-level representations. As a result, our proposed approach is able to predict future sentences using explicit top-down connections that operate at the intermediate layers of the network. By experimenting with benchmarks designed to evaluate discourse-related knowledge using pre-trained sentence representations, we demonstrate that our approach improves performance in 6 out of 11 tasks by excelling in discourse relationship detection.