Rafael Muñoz

Also published as: R. Muñoz, Rafael Muñoz Guillena, Rafael Muñoz-Guillena


2023

pdf bib
A Review in Knowledge Extraction from Knowledge Bases
Fabio Yanez | Andrés Montoyo | Yoan Gutierrez | Rafael Muñoz | Armando Suarez
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

Generative language models achieve the state of the art in many tasks within natural language processing (NLP). Although these models correctly capture syntactic information, they fail to interpret knowledge (semantics). Moreover, the lack of interpretability of these models promotes the use of other technologies as a replacement or complement to generative language models. This is the case with research focused on incorporating knowledge by resorting to knowledge bases mainly in the form of graphs. The generation of large knowledge graphs is carried out with unsupervised or semi-supervised techniques, which promotes the validation of this knowledge with the same type of techniques due to the size of the generated databases. In this review, we will explain the different techniques used to test and infer knowledge from graph structures with machine learning algorithms. The motivation of validating and inferring knowledge is to use correct knowledge in subsequent tasks with improved embeddings.

pdf bib
T2KG: Transforming Multimodal Document to Knowledge Graph
Santiago Galiano | Rafael Muñoz | Yoan Gutiérrez | Andrés Montoyo | Jose Ignacio Abreu | Luis Alfonso Ureña
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

The large amount of information in digital format that exists today makes it unfeasible to use manual means to acquire the knowledge contained in these documents. Therefore, it is necessary to develop tools that allow us to incorporate this knowledge into a structure that is easy to use by both machines and humans. This paper presents a system that can incorporate the relevant information from a document in any format, structured or unstructured, into a semantic network that represents the existing knowledge in the document. The system independently processes from structured documents based on its annotation scheme to unstructured documents, written in natural language, for which it uses a set of sensors that identifies the relevant information and subsequently incorporates it to enrich the semantic network that is created by linking all the information based on the knowledge discovered.

2021

pdf bib
Active Learning for Assisted Corpus Construction: A Case Study in Knowledge Discovery from Biomedical Text
Hian Cañizares-Díaz | Alejandro Piad-Morffis | Suilan Estevez-Velarde | Yoan Gutiérrez | Yudivián Almeida Cruz | Andres Montoyo | Rafael Muñoz-Guillena
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper presents an active learning approach that aims to reduce the human effort required during the annotation of natural language corpora composed of entities and semantic relations. Our approach assists human annotators by intelligently selecting the most informative sentences to annotate and then pre-annotating them with a few highly accurate entities and semantic relations. We define an uncertainty-based query strategy with a weighted density factor, using similarity metrics based on sentence embeddings. As a case study, we evaluate our approach via simulation in a biomedical corpus and estimate the potential reduction in total annotation time. Experimental results suggest that the query strategy reduces by between 35% and 40% the number of sentences that must be manually annotated to develop systems able to reach a target F1 score, while the pre-annotation strategy produces an additional 24% reduction in the total annotation time. Overall, our preliminary experiments suggest that as much as 60% of the annotation time could be saved while producing corpora that have the same usefulness for training machine learning algorithms. An open-source computational tool that implements the aforementioned strategies is presented and published online for the research community.

pdf bib
Knowledge Discovery in COVID-19 Research Literature
Ernesto L. Estevanell-Valladares | Suilan Estevez-Velarde | Alejandro Piad-Morffis | Yoan Gutierrez | Andres Montoyo | Rafael Muñoz | Yudivián Almeida Cruz
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

This paper presents the preliminary results of an ongoing project that analyzes the growing body of scientific research published around the COVID-19 pandemic. In this research, a general-purpose semantic model is used to double annotate a batch of 500 sentences that were manually selected from the CORD-19 corpus. Afterwards, a baseline text-mining pipeline is designed and evaluated via a large batch of 100,959 sentences. We present a qualitative analysis of the most interesting facts automatically extracted and highlight possible future lines of development. The preliminary results show that general-purpose semantic models are a useful tool for discovering fine-grained knowledge in large corpora of scientific documents.

2020

pdf bib
Knowledge Discovery in COVID-19 Research Literature
Alejandro Piad-Morffis | Suilan Estevez-Velarde | Ernesto Luis Estevanell-Valladares | Yoan Gutiérrez | Andrés Montoyo | Rafael Muñoz | Yudivián Almeida-Cruz
Proceedings of the 1st Workshop on NLP for COVID-19 (Part 2) at EMNLP 2020

This paper presents the preliminary results of an ongoing project that analyzes the growing body of scientific research published around the COVID-19 pandemic. In this research, a general-purpose semantic model is used to double annotate a batch of 500 sentences that were manually selected by the researchers from the CORD-19 corpus. Afterwards, a baseline text-mining pipeline is designed and evaluated via a large batch of 100,959 sentences. We present a qualitative analysis of the most interesting facts automatically extracted and highlight possible future lines of development. The preliminary results show that general-purpose semantic models are a useful tool for discovering fine-grained knowledge in large corpora of scientific documents.

pdf bib
Demo Application for the AutoGOAL Framework
Suilan Estevez-Velarde | Alejandro Piad-Morffis | Yoan Gutiérrez | Andres Montoyo | Rafael Muñoz-Guillena | Yudivián Almeida Cruz
Proceedings of the 28th International Conference on Computational Linguistics: System Demonstrations

This paper introduces a web demo that showcases the main characteristics of the AutoGOAL framework. AutoGOAL is a framework in Python for automatically finding the best way to solve a given task. It has been designed mainly for automatic machine learning(AutoML) but it can be used in any scenario where several possible strategies are available to solve a given computational task. In contrast with alternative frameworks, AutoGOAL can be applied seamlessly to Natural Language Processing as well as structured classification problems. This paper presents an overview of the framework’s design and experimental evaluation in several machine learning problems, including two recent NLP challenges. The accompanying software demo is available online (https://autogoal.github.io/demo) and full source code is provided under the MIT open-source license (https://autogoal.github.io).

2019

pdf bib
Demo Application for LETO: Learning Engine Through Ontologies
Suilan Estevez-Velarde | Andrés Montoyo | Yudivian Almeida-Cruz | Yoan Gutiérrez | Alejandro Piad-Morffis | Rafael Muñoz
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

The massive amount of multi-formatted information available on the Web necessitates the design of software systems that leverage this information to obtain knowledge that is valid and useful. The main challenge is to discover relevant information and continuously update, enrich and integrate knowledge from various sources of structured and unstructured data. This paper presents the Learning Engine Through Ontologies(LETO) framework, an architecture for the continuous and incremental discovery of knowledge from multiple sources of unstructured and structured data. We justify the main design decision behind LETO’s architecture and evaluate the framework’s feasibility using the Internet Movie Data Base(IMDB) and Twitter as a practical application.

pdf bib
A Neural Network Component for Knowledge-Based Semantic Representations of Text
Alejandro Piad-Morffis | Rafael Muñoz | Yoan Gutiérrez | Yudivian Almeida-Cruz | Suilan Estevez-Velarde | Andrés Montoyo
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

This paper presents Semantic Neural Networks (SNNs), a knowledge-aware component based on deep learning. SNNs can be trained to encode explicit semantic knowledge from an arbitrary knowledge base, and can subsequently be combined with other deep learning architectures. At prediction time, SNNs provide a semantic encoding extracted from the input data, which can be exploited by other neural network components to build extended representation models that can face alternative problems. The SNN architecture is defined in terms of the concepts and relations present in a knowledge base. Based on this architecture, a training procedure is developed. Finally, an experimental setup is presented to illustrate the behaviour and performance of a SNN for a specific NLP problem, in this case, opinion mining for the classification of movie reviews.

pdf bib
A General-Purpose Annotation Model for Knowledge Discovery: Case Study in Spanish Clinical Text
Alejandro Piad-Morffis | Yoan Guitérrez | Suilan Estevez-Velarde | Rafael Muñoz
Proceedings of the 2nd Clinical Natural Language Processing Workshop

Knowledge discovery from text in natural language is a task usually aided by the manual construction of annotated corpora. Specifically in the clinical domain, several annotation models are used depending on the characteristics of the task to solve (e.g., named entity recognition, relation extraction, etc.). However, few general-purpose annotation models exist, that can support a broad range of knowledge extraction tasks. This paper presents an annotation model designed to capture a large portion of the semantics of natural language text. The structure of the annotation model is presented, with examples of annotated sentences and a brief description of each semantic role and relation defined. This research focuses on an application to clinical texts in the Spanish language. Nevertheless, the presented annotation model is extensible to other domains and languages. An example of annotated sentences, guidelines, and suitable configuration files for an annotation tool are also provided for the research community.

2017

pdf bib
Natural Language Processing Technologies for Document Profiling
Antonio Guillén | Yoan Gutiérrez | Rafael Muñoz
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

Nowadays, search for documents on the Internet is becoming increasingly difficult. The reason is the amount of content published by users (articles, comments, blogs, reviews). How to facilitate that the users can find their required documents? What would be necessary to provide useful document meta-data for supporting search engines? In this article, we present a study of some Natural Language Processing (NLP) technologies that can be useful for facilitating the proper identification of documents according to the user needs. For this purpose, it is designed a document profile that will be able to represent semantic meta-data extracted from documents by using NLP technologies. The research is basically focused on the study of different NLP technologies in order to support the creation our novel document profile proposal from semantic perspectives.

2015

pdf bib
Authorship Verification, Average Similarity Analysis
Daniel Castro Castro | Yaritza Adame Arcia | María Pelaez Brioso | Rafael Muñoz Guillena
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib
Pattern Construction for Extracting Domain Terminology
Yusney Marrero García | Paloma Moreda Pozo | Rafael Muñoz-Guillena
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

pdf bib
UMCC_DLSI_SemSim: Multilingual System for Measuring Semantic Textual Similarity
Alexander Chávez | Héctor Dávila | Yoan Gutiérrez | Antonio Fernández-Orquín | Andrés Montoyo | Rafael Muñoz
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
UMCC_DLSI: A Probabilistic Automata for Aspect Based Sentiment Analysis
Yenier Castañeda | Armando Collazo | Elvis Crego | Jorge L. Garcia | Yoan Gutiérrez | David Tomás | Andrés Montoyo | Rafael Muñoz
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
UMCC_DLSI: Sentiment Analysis in Twitter using Polirity Lexicons and Tweet Similarity
Pedro Aniel Sánchez-Mirabal | Yarelis Ruano Torres | Suilen Hernández Alvarado | Yoan Gutiérrez | Andrés Montoyo | Rafael Muñoz
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib
RA-SR: Using a ranking algorithm to automatically building resources for subjectivity analysis over annotated corpora
Yoan Gutiérrez | Andy González | Antonio Fernández | Andrés Montoyo | Rafael Muñoz
Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
UMCC_DLSI: Textual Similarity based on Lexical-Semantic features
Alexander Chávez | Héctor Dávila | Yoan Gutiérrez | Armando Collazo | José I. Abreu | Antonio Fernández Orquín | Andrés Montoyo | Rafael Muñoz
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
UMCC_DLSI-(EPS): Paraphrases Detection Based on Semantic Distance
Héctor Dávila | Antonio Fernández Orquín | Alexander Chávez | Yoan Gutiérrez | Armando Collazo | José I. Abreu | Andrés Montoyo | Rafael Muñoz
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
UMCC_DLSI: Reinforcing a Ranking Algorithm with Sense Frequencies and Multidimensional Semantic Resources to solve Multilingual Word Sense Disambiguation
Yoan Gutiérrez | Yenier Castañeda | Andy González | Rainel Estrada | Dennys D. Piug | Jose I. Abreu | Roger Pérez | Antonio Fernández Orquín | Andrés Montoyo | Rafael Muñoz | Franc Camara
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
UMCC_DLSI-(SA): Using a ranking algorithm and informal features to solve Sentiment Analysis in Twitter
Yoan Gutiérrez | Andy González | Roger Pérez | José I. Abreu | Antonio Fernández Orquín | Alejandro Mosquera | Andrés Montoyo | Rafael Muñoz | Franc Camara
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

pdf bib
UMCC_DLSI: Semantic and Lexical features for detection and classification Drugs in biomedical texts
Armando Collazo | Alberto Ceballo | Dennys D. Puig | Yoan Gutiérrez | José I. Abreu | Roger Pérez | Antonio Fernández Orquín | Andrés Montoyo | Rafael Muñoz | Franc Camara
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)

2012

pdf bib
UMCC_DLSI: Multidimensional Lexical-Semantic Textual Similarity
Antonio Fernández | Yoan Gutiérrez | Héctor Dávila | Alexander Chávez | Andy González | Rainel Estrada | Yenier Castañeda | Sonia Vázquez | Andrés Montoyo | Rafael Muñoz
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

2011

pdf bib
Investigating Advanced Techniques for Document Content Similarity Applied to External Plagiarism Analysis
Daniel Micol | Rafael Muñoz | Óscar Ferrández
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

pdf bib
Hybrid System For Plagiarism Detection
Javier R. Bru | Patricio Martínez-Barco | Rafael Muñoz
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Aligning FrameNet and WordNet based on Semantic Neighborhoods
Óscar Ferrández | Michael Ellsworth | Rafael Muñoz | Collin F. Baker
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents an algorithm for aligning FrameNet lexical units to WordNet synsets. Both, FrameNet and WordNet, are well-known as well as widely-used resources by the entire research community. They help systems in the comprehension of the semantics of texts, and therefore, finding strategies to link FrameNet and WordNet involves challenges related to a better understanding of the human language. Such deep analysis is exploited by researchers to improve the performance of their applications. The alignment is achieved by exploiting the particular characteristics of each lexical-semantic resource, with special emphasis on the explicit, formal semantic relations in each. Semantic neighborhoods are computed for each alignment of lemmas, and the algorithm calculates correlation scores by comparing such neighborhoods. The results suggest that the proposed algorithm is appropriate for aligning the FrameNet and WordNet hierarchies. Furthermore, the algorithm can aid research on increasing the coverage of FrameNet, building FrameNets in other languages, and creating a system for querying a joint FrameNet-WordNet hierarchy.

2009

pdf bib
A Study on Linking Wikipedia Categories to Wordnet Synsets using Text Similarity
Antonio Toral | Óscar Ferrández | Eneko Agirre | Rafael Muñoz
Proceedings of the International Conference RANLP-2009

2008

pdf bib
Named Entity WordNet
Antonio Toral | Rafael Muñoz | Monica Monachini
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents the automatic extension of Princeton WordNet with Named Entities (NEs). This new resource is called Named Entity WordNet. Our method maps the noun is-a hierarchy of WordNet to Wikipedia categories, identifies the NEs present in the latter and extracts different information from them such as written variants, definitions, etc. This information is inserted into a NE repository. A module that converts from this generic repository to the WordNet specific format has been developed. The paper explores different aspects of our methodology such as the treatment of polysemous terms, the identification of hyponyms within the Wikipedia categorization system, the identification of Wikipedia articles which are NEs and the design of a NE repository compliant with the LMF ISO standard. So far, this procedure enriches WordNet with 310,742 NEs and 381,043 “instance of” relations.

2007

pdf bib
DLSITE-2: Semantic Similarity Based on Syntactic Dependency Trees Applied to Textual Entailment
Daniel Micol | Óscar Ferrández | Rafael Muñoz | Manuel Palomar
Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing

pdf bib
A Perspective-Based Approach for Solving Textual Entailment Recognition
Óscar Ferrández | Daniel Micol | Rafael Muñoz | Manuel Palomar
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Proceedings of the Workshop on Annotating and Reasoning about Time and Events
Branimir Boguraev | Rafael Muñoz | James Pustejovsky
Proceedings of the Workshop on Annotating and Reasoning about Time and Events

pdf bib
Evaluating Knowledge-based Approaches to the Multilingual Extension of a Temporal Expression Normalizer
Matteo Negri | Estela Saquete | Patricio Martínez-Barco | Rafael Muñoz
Proceedings of the Workshop on Annotating and Reasoning about Time and Events

pdf bib
Multilingual Extension of a Temporal Expression Normalizer using Annotated Corpora
E. Saquete | P. Martínez-Barco | R. Muñoz | M. Negri | M. Speranza | R. Sprugnoli
Proceedings of the Cross-Language Knowledge Induction Workshop

pdf bib
A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia
Antonio Toral | Rafael Muñoz
Proceedings of the Workshop on NEW TEXT Wikis and blogs and other dynamic text sources

2004

pdf bib
Splitting Complex Temporal Questions for Question Answering Systems
E. Saquete | P. Martínez-Barco | R. Muñoz | J.L. Vicedo
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2002

pdf bib
Bilingual alignment of anaphoric expressions
R. Muñoz | R. Mitkov | M. Palomar | J. Peral | R. Evans | L. Moreno | C. Orasan | M. Saiz-Noeda | A. Ferrández | C. Barbu | P. Martínez-Barco | A. Suárez
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
An Algorithm for Anaphora Resolution in Spanish Texts
Manuel Palomar | Antonio Ferrández | Lidia Moreno | Patricio Martínez-Barco | Jesús Peral | Maximiliano Saiz-Noeda | Rafael Muñoz
Computational Linguistics, Volume 27, Number 4, December 2001

2000

pdf bib
Semantic approach to bridging reference resolution
Rafael Muñoz | Maximiliano Saiz-Noeda | Armando Suárez | Manual Palomar
Proceedings of the International Conference on Machine Translation and Multilingual Applications in the new Millennium: MT 2000