Manuel Montes - ACL Anthology

Manuel Montes

Also published as: Manuel Montes y Gómez, Manuel Montes-y-Gómez, Manuel Montes y Gomez

2025

Text Graph Neural Networks for Detecting AI-Generated Content
Andric Valdez-Valenzuela | Helena Gómez-Adorno | Manuel Montes-y-Gómez
Proceedings of the 1stWorkshop on GenAI Content Detection (GenAIDetect)

The widespread availability of Large Language Models (LLMs) such as GPT-4 and Llama-3, among others, has led to a surge in machine-generated content across various platforms, including social media, educational tools, and academic settings. While these models demonstrate remarkable capabilities in generating coherent text, their misuse raises significant concerns. For this reason, detecting machine-generated text has become a pressing need to mitigate these risks. This research proposed a novel classification method combining text-graph representations with Graph Neural Networks (GNNs) and different node feature initialization strategies to distinguish between human-written and machine-generated content. Experimental results demonstrate that the proposed approach outperforms traditional machine learning classifiers, highlighting the effectiveness of integrating structural and semantic relationships in text.

GAttention: Gated Attention for the Detection of Abusive Language
Horacio Jarquín Vásquez | Hugo Jair Escalante | Manuel Montes | Mario Ezra Aragon
Findings of the Association for Computational Linguistics: EMNLP 2025

Abusive language online creates toxic environments and exacerbates social tensions, underscoring the need for robust NLP models to interpret nuanced linguistic cues. This paper introduces GAttention, a novel Gated Attention mechanism that combines the strengths of Contextual attention and Self-attention mechanisms to address the limitations of existing attention models within the text classification task. GAttention capitalizes on local and global query vectors by integrating the internal relationships within a sequence (Self-attention) and the global relationships among distinct sequences (Contextual attention). This combination allows for a more nuanced understanding and processing of sequence elements, which is particularly beneficial in context-sensitive text classification tasks such as the case of abusive language detection. By applying this mechanism to transformer-based encoder models, we showcase how it enhances the model’s ability to discern subtle nuances and contextual clues essential for identifying abusive language, a challenging and increasingly relevant NLP task.

2023

DisorBERT: A Double Domain Adaptation Model for Detecting Signs of Mental Disorders in Social Media
Mario Ezra Aragón | A. Pastor López-Monroy | Luis C. González | David E. Losada | Manuel Montes-y-Gómez
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Mental disorders affect millions of people worldwide and cause interference with their thinking and behavior. Through the past years, awareness created by health campaigns and other sources motivated the study of these disorders using information extracted from social media platforms. In this work, we aim to contribute to the study of these disorders and to the understanding of how mental problems reflect on social media. To achieve this goal, we propose a double-domain adaptation of a language model. First, we adapted the model to social media language, and then, we adapted it to the mental health domain. In both steps, we incorporated a lexical resource to guide the masking process of the language model and, therefore, to help it in paying more attention to words related to mental disorders. We have evaluated our model in the detection of signs of three major mental disorders: Anorexia, Self-harm, and Depression. Results are encouraging as they show that the proposed adaptation enhances the classification performance and yields competitive results against state-of-the-art methods.

2021

Masking and Transformer-based Models for Hyperpartisanship Detection in News
Javier Sánchez-Junquera | Paolo Rosso | Manuel Montes-y-Gómez | Simone Paolo Ponzetto
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Hyperpartisan news show an extreme manipulation of reality based on an underlying and extreme ideological orientation. Because of its harmful effects at reinforcing one’s bias and the posterior behavior of people, hyperpartisan news detection has become an important task for computational linguists. In this paper, we evaluate two different approaches to detect hyperpartisan news. First, a text masking technique that allows us to compare style vs. topic-related features in a different perspective from previous work. Second, the transformer-based models BERT, XLM-RoBERTa, and M-BERT, known for their ability to capture semantic and syntactic patterns in the same representation. Our results corroborate previous research on this task in that topic-related features yield better results than style-based ones, although they also highlight the relevance of using higher-length n-grams. Furthermore, they show that transformer-based models are more effective than traditional methods, but this at the cost of greater computational complexity and lack of transparency. Based on our experiments, we conclude that the beginning of the news show relevant information for the transformers at distinguishing effectively between left-wing, mainstream, and right-wing orientations.

Multimodal Weighted Fusion of Transformers for Movie Genre Classification
Isaac Rodríguez Bribiesca | Adrián Pastor López Monroy | Manuel Montes-y-Gómez
Proceedings of the Third Workshop on Multimodal Artificial Intelligence

The Multimodal Transformer showed to be a competitive model for multimodal tasks involving textual, visual and audio signals. However, as more modalities are involved, its late fusion by concatenation starts to have a negative impact on the model’s performance. Besides, interpreting model’s predictions becomes difficult, as one would have to look at the different attention activation matrices. In order to overcome these shortcomings, we propose to perform late fusion by adding a GMU module, which effectively allows the model to weight modalities at instance level, improving its performance while providing a better interpretabilty mechanism. In the experiments, we compare our proposed model (MulT-GMU) against the original implementation (MulT-Concat) and a SOTA model tested in a movie genre classification dataset. Our approach, MulT-GMU, outperforms both, MulT-Concat and previous SOTA model.

Self-Contextualized Attention for Abusive Language Identification
Horacio Jarquín-Vásquez | Hugo Jair Escalante | Manuel Montes
Proceedings of the Ninth International Workshop on Natural Language Processing for Social Media

The use of attention mechanisms in deep learning approaches has become popular in natural language processing due to its outstanding performance. The use of these mechanisms allows one managing the importance of the elements of a sequence in accordance to their context, however, this importance has been observed independently between the pairs of elements of a sequence (self-attention) and between the application domain of a sequence (contextual attention), leading to the loss of relevant information and limiting the representation of the sequences. To tackle these particular issues we propose the self-contextualized attention mechanism, which trades off the previous limitations, by considering the internal and contextual relationships between the elements of a sequence. The proposed mechanism was evaluated in four standard collections for the abusive language identification task achieving encouraging results. It outperformed the current attention mechanisms and showed a competitive performance with respect to state-of-the-art approaches.

UACH-INAOE at SMM4H: a BERT based approach for classification of COVID-19 Twitter posts
Alberto Valdes | Jesus Lopez | Manuel Montes
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

This work describes the participation of the Universidad Autónoma de Chihuahua - Instituto Nacional de Astrofísica, Óptica y Electrónica team at the Social Media Mining for Health Applications (SMM4H) 2021 shared task. Our team participated in task 5 and 6, both focused on the automatic classification of Twitter posts related to COVID-19. Task 5 was oriented on solving a binary classification problem, trying to identify self-reporting tweets of potential cases of COVID-19. Task 6 objective was to classify tweets containing COVID-19 symptoms. For both tasks we used models based on bidirectional encoder representations from transformers (BERT). Our objective was to determine if a model pretrained on a corpus in the domain of interest can outperform one trained on a much larger general domain corpus. Our F1 results were encouraging, 0.77 and 0.95 for task 5 and 6 respectively, having achieved the highest score among all the participants in the latter.

2020

A Deep Metric Learning Method for Biomedical Passage Retrieval
Andrés Rosso-Mateus | Fabio A. González | Manuel Montes-y-Gómez
Proceedings of the 28th International Conference on Computational Linguistics

Passage retrieval is the task of identifying text snippets that are valid answers for a natural language posed question. One way to address this problem is to look at it as a metric learning problem, where we want to induce a metric between questions and passages that assign smaller distances to more relevant passages. In this work, we present a novel method for passage retrieval that learns a metric for questions and passages based on their internal semantic interactions. The method uses a similar approach to that of triplet networks, where the training samples are composed of one anchor (the question) and two positive and negative samples (passages). However,and in contrast with triplet networks, the proposed method uses a novel deep architecture that better exploits the particularities of text and takes into consideration complementary relatedness measures. Besides, the paper presents a sampling strategy that selects both easy and hard negative samples which improves the accuracy of the trained model. The method is particularly well suited for domain-specific passage retrieval where it is very important to take into account different sources of information. The proposed approach was evaluated in a biomedical passage retrieval task, the BioASQ challenge, outperforming standard triplet loss substantially by 10%,and state-of-the-art performance by 26%.

Automatic Detection of Offensive Language in Social Media: Defining Linguistic Criteria to build a Mexican Spanish Dataset
María José Díaz-Torres | Paulina Alejandra Morán-Méndez | Luis Villasenor-Pineda | Manuel Montes-y-Gómez | Juan Aguilera | Luis Meneses-Lerín
Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying

Phenomena such as bullying, homophobia, sexism and racism have transcended to social networks, motivating the development of tools for their automatic detection. The challenge becomes greater for languages rich in popular sayings, colloquial expressions and idioms which may contain vulgar, profane or rude words, but not always have the intention of offending, as is the case of Mexican Spanish. Under these circumstances, the identification of the offense goes beyond the lexical and syntactic elements of the message. This first work aims to define the main linguistic features of aggressive, offensive and vulgar language in social networks in order to establish linguistic-based criteria to facilitate the identification of abusive language. For this purpose, a Mexican Spanish Twitter corpus was compiled and analyzed. The dataset included words that, despite being rude, need to be considered in context to determine they are part of an offense. Based on the analysis of this corpus, linguistic criteria were defined to determine whether a message is offensive. To simplify the application of these criteria, an easy-to-follow diagram was designed. The paper presents an example of the use of the diagram, as well as the basic statistics of the corpus.

2019

Detecting Depression in Social Media using Fine-Grained Emotions
Mario Ezra Aragón | Adrian Pastor López-Monroy | Luis Carlos González-Gurrola | Manuel Montes-y-Gómez
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Nowadays social media platforms are the most popular way for people to share information, from work issues to personal matters. For example, people with health disorders tend to share their concerns for advice, support or simply to relieve suffering. This provides a great opportunity to proactively detect these users and refer them as soon as possible to professional help. We propose a new representation called Bag of Sub-Emotions (BoSE), which represents social media documents by a set of fine-grained emotions automatically generated using a lexical resource of emotions and subword embeddings. The proposed representation is evaluated in the task of depression detection. The results are encouraging; the usage of fine-grained emotions improved the results from a representation based on the core emotions and obtained competitive results in comparison to state of the art approaches.

Jointly Learning Author and Annotated Character N-gram Embeddings: A Case Study in Literary Text
Suraj Maharjan | Deepthi Mave | Prasha Shrestha | Manuel Montes | Fabio A. González | Thamar Solorio
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

An author’s way of presenting a story through his/her writing style has a great impact on whether the story will be liked by readers or not. In this paper, we learn representations for authors of literary texts together with representations for character n-grams annotated with their functional roles. We train a neural character n-gram based language model using an external corpus of literary texts and transfer learned representations for use in downstream tasks. We show that augmenting the knowledge from external works of authors produces results competitive with other style-based methods for book likability prediction, genre classification, and authorship attribution.

2018

Early Text Classification Using Multi-Resolution Concept Representations
Adrian Pastor López-Monroy | Fabio A. González | Manuel Montes | Hugo Jair Escalante | Thamar Solorio
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

The intensive use of e-communications in everyday life has given rise to new threats and risks. When the vulnerable asset is the user, detecting these potential attacks before they cause serious damages is extremely important. This paper proposes a novel document representation to improve the early detection of risks in social media sources. The goal is to effectively identify the potential risk using as few text as possible and with as much anticipation as possible. Accordingly, we devise a Multi-Resolution Representation (MulR), which allows us to generate multiple “views” of the analyzed text. These views capture different semantic meanings for words and documents at different levels of detail, which is very useful in early scenarios to model the variable amounts of evidence. Intuitively, the representation captures better the content of short documents (very early stages) in low resolutions, whereas large documents (medium/large stages) are better modeled with higher resolutions. We evaluate the proposed ideas in two different tasks where anticipation is critical: sexual predator detection and depression detection. The experimental evaluation for these early tasks revealed that the proposed approach outperforms previous methodologies by a considerable margin.

MindLab Neural Network Approach at BioASQ 6B
Andrés Rosso-Mateus | Fabio A. González | Manuel Montes-y-Gómez
Proceedings of the 6th BioASQ Workshop A challenge on large-scale biomedical semantic indexing and question answering

Biomedical Question Answering is concerned with the development of methods and systems that automatically find answers to natural language posed questions. In this work, we describe the system used in the BioASQ Challenge task 6b for document retrieval and snippet retrieval (with particular emphasis in this subtask). The proposed model makes use of semantic similarity patterns that are evaluated and measured by a convolutional neural network architecture. Subsequently, the snippet ranking performance is improved with a pseudo-relevance feedback approach in a later step. Based on the preliminary results, we reached the second position in snippet retrieval sub-task.

INAOE-UPV at SemEval-2018 Task 3: An Ensemble Approach for Irony Detection in Twitter
Delia Irazú Hernández Farías | Fernando Sánchez-Vega | Manuel Montes-y-Gómez | Paolo Rosso
Proceedings of the 12th International Workshop on Semantic Evaluation

This paper describes an ensemble approach to the SemEval-2018 Task 3. The proposed method is composed of two renowned methods in text classification together with a novel approach for capturing ironic content by exploiting a tailored lexicon for irony detection. We experimented with different ensemble settings. The obtained results show that our method has a good performance for detecting the presence of ironic content in Twitter.

A Genre-Aware Attention Model to Improve the Likability Prediction of Books
Suraj Maharjan | Manuel Montes | Fabio A. González | Thamar Solorio
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Likability prediction of books has many uses. Readers, writers, as well as the publishing industry, can all benefit from automatic book likability prediction systems. In order to make reliable decisions, these systems need to assimilate information from different aspects of a book in a sensible way. We propose a novel multimodal neural architecture that incorporates genre supervision to assign weights to individual feature types. Our proposed method is capable of dynamically tailoring weights given to feature types based on the characteristics of each book. Our architecture achieves competitive results and even outperforms state-of-the-art for this task.

Letting Emotions Flow: Success Prediction by Modeling the Flow of Emotions in Books
Suraj Maharjan | Sudipta Kar | Manuel Montes | Fabio A. González | Thamar Solorio
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)

Books have the power to make us feel happiness, sadness, pain, surprise, or sorrow. An author’s dexterity in the use of these emotions captivates readers and makes it difficult for them to put the book down. In this paper, we model the flow of emotions over a book using recurrent neural networks and quantify its usefulness in predicting success in books. We obtained the best weighted F1-score of 69% for predicting books’ success in a multitask setting (simultaneously predicting success and genre of books).

2017

Convolutional Neural Networks for Authorship Attribution of Short Texts
Prasha Shrestha | Sebastian Sierra | Fabio González | Manuel Montes | Paolo Rosso | Thamar Solorio
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

We present a model to perform authorship attribution of tweets using Convolutional Neural Networks (CNNs) over character n-grams. We also present a strategy that improves model interpretability by estimating the importance of input text fragments in the predicted classification. The experimental evaluation shows that text CNNs perform competitively and are able to outperform previous methods.

A Multi-task Approach to Predict Likability of Books
Suraj Maharjan | John Arevalo | Manuel Montes | Fabio A. González | Thamar Solorio
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

We investigate the value of feature engineering and neural network models for predicting successful writing. Similar to previous work, we treat this as a binary classification task and explore new strategies to automatically learn representations from book contents. We evaluate our feature set on two different corpora created from Project Gutenberg books. The first presents a novel approach for generating the gold standard labels for the task and the other is based on prior research. Using a combination of hand-crafted and recurrent neural network learned representations in a dual learning setting, we obtain the best performance of 73.50% weighted F1-score.

2016

Early text classification: a Naïve solution
Hugo Jair Escalante | Manuel Montes y Gomez | Luis Villasenor | Marcelo Luis Errecalde
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Domain Adaptation for Authorship Attribution: Improved Structural Correspondence Learning
Upendra Sapkota | Thamar Solorio | Manuel Montes | Steven Bethard
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

Not All Character N-grams Are Created Equal: A Study in Authorship Attribution
Upendra Sapkota | Steven Bethard | Manuel Montes | Thamar Solorio
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

Cross-Topic Authorship Attribution: Will Out-Of-Topic Data Help?
Upendra Sapkota | Thamar Solorio | Manuel Montes | Steven Bethard | Paolo Rosso
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

INAOE_UPV-CORE: Extracting Word Associations from Document Corpora to estimate Semantic Textual Similarity
Fernando Sánchez-Vega | Manuel Montes-y-Gómez | Paolo Rosso | Luis Villaseñor-Pineda
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

Using PU-Learning to Detect Deceptive Opinion Spam
Donato Hernández Fusilier | Rafael Guzmán Cabrera | Manuel Montes-y-Gómez | Paolo Rosso
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Sexual predator detection in chats with chained classifiers
Hugo Jair Escalante | Esaú Villatoro-Tello | Antonio Juárez | Manuel Montes-y-Gómez | Luis Villaseñor
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Exploring Word Class N-grams to Measure Language Development in Children
Gabriela Ramírez de la Rosa | Thamar Solorio | Manuel Montes | Yang Liu | Lisa Bedore | Elizabeth Peña | Aquiles Iglesias
Proceedings of the 2013 Workshop on Biomedical Natural Language Processing

2011

Local Histograms of Character N-grams for Authorship Attribution
Hugo Jair Escalante | Thamar Solorio | Manuel Montes-y-Gómez
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Modality Specific Meta Features for Authorship Attribution in Web Forum Posts
Thamar Solorio | Sangita Pillay | Sindhu Raghavan | Manuel Montes y Gómez
Proceedings of 5th International Joint Conference on Natural Language Processing

2008

Two Approaches for Multilingual Question Answering: Merging Passages vs. Merging Answers
Rita M. Aceves-Pérez | Manuel Montes-y-Gómez | Luis Villaseñor-Pineda | L. Alfonso Ureña-López
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 1, March 2008: Special Issue on Cross-Lingual Information Retrieval and Question Answering

2004

A Language Independent Method for Question Classification
Thamar Solorio | Manuel Pérez-Coutiño | Manuel Montes-y-Gómez | Luis Villaseñor-Pineda | Aurelio López-López
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Co-authors

Adrian Pastor Lopez Monroy 4

Suraj Maharjan 4

Mario Ezra Aragón 3

Steven Bethard 3

Upendra Sapkota 3

Horacio Jarquín-Vásquez 2

Andrés Rosso-Mateus 2

Fernando Sanchez-Vega 2

Prasha Shrestha 2

Rita M. Aceves-Pérez 1

Juan Aguilera 1

María José Díaz-Torres 1

Marcelo Luis Errecalde 1

Helena Gomez Adorno 1

Luis C. González 1

Luis Carlos González-Gurrola 1

Rafael Guzmán Cabrera 1

Delia Irazú Hernández Farías 1

Donato Hernández Fusilier 1

Aquiles Iglesias 1

Antonio Juárez 1

Yang Liu (刘扬) 1

David E. Losada 1

Aurelio López-López 1

Luis Meneses-Lerín 1

Paulina Alejandra Morán-Méndez 1

Elizabeth Peña 1

Sangita Pillay 1

Simone Paolo Ponzetto 1

Manuel Pérez-Coutiño 1

Sindhu Raghavan 1

Gabriela Ramírez de la Rosa 1

Isaac Rodríguez Bribiesca 1

Sebastian Sierra 1

Javier Sánchez-Junquera 1

L. Alfonso Urena Lopez 1

Alberto Valdes 1

Andric Valdez-Valenzuela 1

Esaú Villatoro-tello 1

Venues