Renata Vieira

Also published as: R. Vieira


2024

pdf bib
Named entity recognition specialised for Portuguese 18th-century History research
Joaquim Santos | Helena Freire Cameron | Fernanda Olival | Fátima Farrica | Renata Vieira
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

pdf bib
Applying event classification to reveal the Estado da Índia
Gonçalo C. Albuquerque | Marlo Souza | Renata Vieira | Ana Sofia Ribeiro
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1

pdf bib
Decoding Sentiments about Migration in Portuguese Political Manifestos (2011, 2015, 2019)
Erik Bran Marino | Renata Vieira | Jesus Manuel Benitez Baleato | Ana Sofia Ribeiro | Katarina Laken
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2

pdf bib
Analysing entity distribution in an annotated 18th-century historical source
Daniel De Los Reyes | Renata Vieira | Fernanda Olival | Helena Freire Cameron | Fátima Farrica
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2

2022

pdf bib
BRATECA (Brazilian Tertiary Care Dataset): a Clinical Information Dataset for the Portuguese Language
Bernardo Consoli | Henrique D. P. dos Santos | Ana Helena D. P. S. Ulbrich | Renata Vieira | Rafael H. Bordini
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Computational medicine research requires clinical data for training and testing purposes, so the development of datasets composed of real hospital data is of utmost importance in this field. Most such data collections are in the English language, were collected in anglophone countries, and do not reflect other clinical realities, which increases the importance of national datasets for projects that hope to positively impact public health. This paper presents a new Brazilian Clinical Dataset containing over 70,000 admissions from 10 hospitals in two Brazilian states, composed of a sum total of over 2.5 million free-text clinical notes alongside data pertaining to patient information, prescription information, and exam results. This data was collected, organized, deidentified, and is being distributed via credentialed access for the use of the research community. In the course of presenting the new dataset, this paper will explore the new dataset’s structure, population, and potential benefits of using this dataset in clinical AI tasks.

2021

pdf bib
Related Named Entities Classification in the Economic-Financial Context
Daniel De Los Reyes | Allan Barcelos | Renata Vieira | Isabel Manssour
Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

The present work uses the Bidirectional Encoder Representations from Transformers (BERT) to process a sentence and its entities and indicate whether two named entities present in a sentence are related or not, constituting a binary classification problem. It was developed for the Portuguese language, considering the financial domain and exploring deep linguistic representations to identify a relation between entities without using other lexical-semantic resources. The results of the experiments show an accuracy of 86% of the predictions.

2020

pdf bib
Embeddings for Named Entity Recognition in Geoscience Portuguese Literature
Bernardo Consoli | Joaquim Santos | Diogo Gomes | Fabio Cordeiro | Renata Vieira | Viviane Moreira
Proceedings of the Twelfth Language Resources and Evaluation Conference

This work focuses on Portuguese Named Entity Recognition (NER) in the Geology domain. The only domain-specific dataset in the Portuguese language annotated for NER is the GeoCorpus. Our approach relies on BiLSTM-CRF neural networks (a widely used type of network for this area of research) that use vector and tensor embedding representations. Three types of embedding models were used (Word Embeddings, Flair Embeddings, and Stacked Embeddings) under two versions (domain-specific and generalized). The domain specific Flair Embeddings model was originally trained with a generalized context in mind, but was then fine-tuned with domain-specific Oil and Gas corpora, as there simply was not enough domain corpora to properly train such a model. Each of these embeddings was evaluated separately, as well as stacked with another embedding. Finally, we achieved state-of-the-art results for this domain with one of our embeddings, and we performed an error analysis on the language model that achieved the best results. Furthermore, we investigated the effects of domain-specific versus generalized embeddings.

pdf bib
Word Embedding Evaluation in Downstream Tasks and Semantic Analogies
Joaquim Santos | Bernardo Consoli | Renata Vieira
Proceedings of the Twelfth Language Resources and Evaluation Conference

Language Models have long been a prolific area of study in the field of Natural Language Processing (NLP). One of the newer kinds of language models, and some of the most used, are Word Embeddings (WE). WE are vector space representations of a vocabulary learned by a non-supervised neural network based on the context in which words appear. WE have been widely used in downstream tasks in many areas of study in NLP. These areas usually use these vector models as a feature in the processing of textual data. This paper presents the evaluation of newly released WE models for the Portuguese langauage, trained with a corpus composed of 4.9 billion tokens. The first evaluation presented an intrinsic task in which WEs had to correctly build semantic and syntactic relations. The second evaluation presented an extrinsic task in which the WE models were used in two downstream tasks: Named Entity Recognition and Semantic Similarity between Sentences. Our results show that a diverse and comprehensive corpus can often outperform a larger, less textually diverse corpus, and that batch training may cause quality loss in WE models.

2018

pdf bib
BlogSet-BR: A Brazilian Portuguese Blog Corpus
Henrique Santos | Vinicius Woloszyn | Renata Vieira
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
PLN-PUCRS at EmoInt-2017: Psycholinguistic features for emotion intensity prediction in tweets
Henrique Santos | Renata Vieira
Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Linguistic Inquiry and Word Count (LIWC) is a rich dictionary that map words into several psychological categories such as Affective, Social, Cognitive, Perceptual and Biological processes. In this work, we have used LIWC psycholinguistic categories to train regression models and predict emotion intensity in tweets for the EmoInt-2017 task. Results show that LIWC features may boost emotion intensity prediction on the basis of a low dimension set.

pdf bib
Wheel of Life: an initial investigation. Topic-Related Polarity Visualization in Personal Stories
Henrique Santos | Renata Vieira | Greice Pinho | Jackson Pinheiro
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology

pdf bib
Processo de construção de um corpus anotado com Entidades Geológicas visando REN (Building an annotated corpus with geological entities for NER)[In Portuguese]
Daniela Amaral | Sandra Collovini | Anny Figueira | Renata Vieira | Renata Vieira | Marco Gonzalez
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology

pdf bib
Processo de construção de um corpus anotado com Entidades Geológicas visando REN (Building an annotated corpus with geological entities for NER)[In Portuguese]
Daniela Amaral | Sandra Collovini | Anny Figueira | Renata Vieira | Renata Vieira | Marco Gonzalez
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology

2016

pdf bib
Adapting an Entity Centric Model for Portuguese Coreference Resolution
Evandro Fonseca | Renata Vieira | Aline Vanin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents the adaptation of an Entity Centric Model for Portuguese coreference resolution, considering 10 named entity categories. The model was evaluated on named e using the HAREM Portuguese corpus and the results are 81.0% of precision and 58.3% of recall overall, the resulting system is freely available

pdf bib
A Sequence Model Approach to Relation Extraction in Portuguese
Sandra Collovini | Gabriel Machado | Renata Vieira
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The task of Relation Extraction from texts is one of the main challenges in the area of Information Extraction, considering the required linguistic knowledge and the sophistication of the language processing techniques employed. This task aims at identifying and classifying semantic relations that occur between entities recognized in a given text. In this paper, we evaluated a Conditional Random Fields classifier for the extraction of any relation descriptor occurring between named entities (Organisation, Person and Place categories), as well as pre-defined relation types between these entities in Portuguese texts.

pdf bib
Summ-it++: an Enriched Version of the Summ-it Corpus
Evandro Fonseca | André Antonitsch | Sandra Collovini | Daniela Amaral | Renata Vieira | Anny Figueira
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper presents Summ-it++, an enriched version the Summ-it corpus. In this new version, the corpus has received new semantic layers, named entity categories and relations between named entities, adding to the previous coreference annotation. In addition, we change the original Summ-it format to SemEval

2015

pdf bib
Comparative Analysis between Notations to Classify Named Entities using Conditional Random Fields
Daniela Oliveira F. do Amaral | Maiki Buffet | Renata Vieira
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology

pdf bib
Building and Applying Profiles Through Term Extraction
Lucelene Lopes | Renata Vieira
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology

2014

pdf bib
Building Domain Specific Bilingual Dictionaries
Lucas Hilgert | Lucelene Lopes | Artur Freitas | Renata Vieira | Denise Hogetop | Aline Vanin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper proposes a method to build bilingual dictionaries for specific domains defined by a parallel corpora. The proposed method is based on an original method that is not domain specific. Both the original and the proposed methods are constructed with previously available natural language processing tools. Therefore, this paper contribution resides in the choice and parametrization of the chosen tools. To illustrate the proposed method benefits we conduct an experiment over technical manuals in English and Portuguese. The results of our proposed method were analyzed by human specialists and our results indicates significant increases in precision for unigrams and muli-grams. Numerically, the precision increase is as big as 15% according to our evaluation.

pdf bib
Comparative Analysis of Portuguese Named Entities Recognition Tools
Daniela Amaral | Evandro Fonseca | Lucelene Lopes | Renata Vieira
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper describes an experiment to compare four tools to recognize named entities in Portuguese texts. The experiment was made over the HAREM corpora, a golden standard for named entities recognition in Portuguese. The tools experimented are based on natural language processing techniques and also machine learning. Specifically, one of the tools is based on Conditional random fields, an unsupervised machine learning model that has being used to named entities recognition in several languages, while the other tools follow more traditional natural language approaches. The comparison results indicate advantages for different tools according to the different classes of named entities. Despite of such balance among tools, we conclude pointing out foreseeable advantages to the machine learning based tool.

pdf bib
VOAR: A Visual and Integrated Ontology Alignment Environment
Bernardo Severo | Cassia Trojahn | Renata Vieira
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Ontology alignment is a key process for enabling interoperability between ontology-based systems in the Linked Open Data age. From two input ontologies, this process generates an alignment (set of correspondences) between them. In this paper we present VOAR, a new web-based environment for ontology alignment visualization and manipulation. Within this graphical environment, users can manually create/edit correspondences and apply a set of operations on alignments (filtering, merge, difference, etc.). VOAR allows invoking external ontology matching systems that implement a specific alignment interface, so that the generated alignments can be manipulated within the environment. Evaluating multiple alignments together against a reference one can also be carried out, using classical evaluation metrics (precision, recall and f-measure). The status of each correspondence with respect to its presence or absence in reference alignment is visually represented. Overall, the main new aspect of VOAR is the visualization and manipulation of alignments at schema level, in an integrated, visual and web-based environment.

2013

pdf bib
O Reconhecimento de Entidades Nomeadas por meio de Conditional Random Fields para a Língua Portuguesa (Named Entity Recognition with Conditional Random Fields for the Portuguese Language) [in Portuguese]
Daniela O. F. do Amaral | Renata Vieira
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

pdf bib
Aplicando Pontos de Corte para Listas de Termos Extraídos (Applying Cut-off Points to Lists of Extracted Terms) [in Portuguese]
Lucelene Lopes | Renata Vieira
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

pdf bib
Geração de features para resolução de correferência: Pessoa, Local e Organização (Feature Generation for Coreference Resolution: Person, Location and Organization) [in Portuguese]
Evandro B. Fonseca | Renata Vieira | Aline A. Vanin
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

pdf bib
Entity-centric Sentiment Analysis on Twitter data for the Potuguese Language
Marlo Souza | Renata Vieira
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

pdf bib
Extração de Vocabulário Multilíngue para Tradução em Domínios Especializados (Multilingual Vocabulary Extraction for Machine Translation in Specialized Domains) [in Portuguese]
Lucas Welter Hilgert | Renata Vieira
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

2012

pdf bib
A Comparable Corpus Based on Aligned Multilingual Ontologies
Roger Granada | Lucelene Lopes | Carlos Ramisch | Cassia Trojahn | Renata Vieira | Aline Villavicencio
Proceedings of the First Workshop on Multilingual Modeling

pdf bib
A Fast, Memory Efficient, Scalable and Multilingual Dictionary Retriever
Paulo Fernandes | Lucelene Lopes | Carlos A. Prolo | Afonso Sales | Renata Vieira
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a novel approach to deal with dictionary retrieval. This new approach is based on a very efficient and scalable theoretical structure called Multi-Terminal Multi-valued Decision Diagrams (MTMDD). Such tool allows the definition of very large, even multilingual, dictionaries without significant increase in memory demands, and also with virtually no additional processing cost. Besides the general idea of the novel approach, this paper presents a description of the technologies involved, and their implementation in a software package called WAGGER. Finally, we also present some examples of usage and possible applications of this dictionary retriever.

pdf bib
Corpus+WordNet thesaurus generation for ontology enriching
Fernando Castilho | Roger Granada | Breno Meneghetti | Leonardo Carvalho | Renata Vieira
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a model to enrich an ontology with a thesaurus based on a domain corpus and WordNet. The model is applied to the data privacy domain and the initial domain resources comprise a data privacy ontology, a corpus of privacy laws, regulations and guidelines for projects. Based on these resources, a thesaurus is automatically generated. The thesaurus seeds are composed by the ontology concepts. For these seeds similar terms are extracted from the corpus using known thesaurus generation methods. A filtering process searches for semantic relations between seeds and similar terms within WordNet. As a result, these semantic relations are used to expand the ontology with relations between them and related terms in the corpus. The resulting resource is a hierarchical structure that can help on the ontology investigation and maintenance. The results allow the investigation of the domain knowledge with the support of semantic relations not present on the original ontology.

2011

pdf bib
Construction of a Portuguese Opinion Lexicon from multiple resources
Marlo Souza | Renata Vieira | Débora Busetti | Rove Chishman | Isa Mara Alves
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

pdf bib
Extração de Contextos Definitórios a partir de Textos em Língua Portuguesa (Extraction of Defining Contexts from Texts in Portuguese) [in Portuguese]
Igor S. Wendt | Renata Vieira
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

2010

pdf bib
An API for Multi-lingual Ontology Matching
Cássia Trojahn | Paulo Quaresma | Renata Vieira
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Ontology matching consists of generating a set of correspondences between the entities of two ontologies. This process is seen as a solution to data heterogeneity in ontology-based applications, enabling the interoperability between them. However, existing matching systems are designed by assuming that the entities of both source and target ontologies are written in the same languages ( English, for instance). Multi-lingual ontology matching is an open research issue. This paper describes an API for multi-lingual matching that implements two strategies, direct translation-based and indirect. The first strategy considers direct matching between two ontologies (i.e., without intermediary ontologies), with the help of external resources, i.e., translations. The indirect alignment strategy, proposed by (Jung et al., 2009), is based on composition of alignments. We evaluate these strategies using simple string similarity based matchers and three ontologies written in English, French, and Portuguese, an extension of the OAEI benchmark test 206.

2008

pdf bib
A Framework for Multilingual Ontology Mapping
Cássia Trojahn | Paulo Quaresma | Renata Vieira
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In the field of ontology mapping, multilingual ontology mapping is an issue that is not well explored. This paper proposes a framework for mapping of multilingual Description Logics (DL) ontologies. First, the DL source ontology is translated to the target ontology language, using a lexical database or a dictionary, generating a DL translated ontology. The target and the translated ontologies are then used as input for the mapping process. The mappings are computed by specialized agents using different mapping approaches. Next, these agents use argumentation to exchange their local results, in order to agree on the obtained mappings. Based on their preferences and confidence of the arguments, the agents compute their preferred mapping sets. The arguments in such preferred sets are viewed as the set of globally acceptable arguments. A DL mapping ontology is generated as result of the mapping process. In this paper we focus on the process of generating the DL translated ontology.

2006

pdf bib
Semantic tagging for resolution of indirect anaphora
R. Vieira | E. Bick | J. Coelho | V. Muller | S. Collovini | J. Souza | L. Rino
Proceedings of the 7th SIGdial Workshop on Discourse and Dialogue

2004

pdf bib
Using Word Similarity Lists for Resolving Indirect Anaphora
Caroline Gasperin | Renata Vieira
Proceedings of the Conference on Reference Resolution and Its Applications

pdf bib
Discourse-New Detectors for Definite Description Resolution: A Survey and a Preliminary Proposal
Massimo Poesio | Olga Uryupina | Renata Vieira | Mijail Alexandrov-Kabadjov | Rodrigo Goulart
Proceedings of the Conference on Reference Resolution and Its Applications

pdf bib
Mining Linguistically Interpreted Texts
Cassiana Fagundes da Silva | Renata Vieira | Fernando Santos Osório | Paulo Quaresma
Proceedings of the 5th International Workshop on Linguistically Interpreted Corpora

2003

pdf bib
From Concrete to Virtual Annotation Mark-up Language: The Case of COMMOn-REFs
Renata Vieira | Caroline Gasperin | Rodrigo Goulart | Susanne Salmon-Alt
Proceedings of the ACL 2003 Workshop on Linguistic Annotation: Getting the Model Right

2002

pdf bib
Acquiring Lexical Knowledge for Anaphora Resolution
Massimo Poesio | Tomonori Ishikawa | Sabine Schulte im Walde | Renata Vieira
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
Nominal Expressions in Multilingual Corpora: Definites and Demonstratives
Susanne Salmon-Alt | Renata Vieira
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
An Empirically-based System for Processing Definite Descriptions
Renata Vieira | Massimo Poesio
Computational Linguistics, Volume 26, Number 4, December 2000

pdf bib
Corpus-based Development and Evaluation of a System for Processing Definite Descriptions
Renata Vieira | Massimo Poesio
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1998

pdf bib
A Corpus-based Investigation of Definite Description Use
Massimo Poesio | Renata Vieira
Computational Linguistics, Volume 24, Number 2, June 1998

1997

pdf bib
Resolving bridging references in unrestricted text
Massimo Poesio | Renata Vieira | Simone Teufel
Operational Factors in Practical, Robust Anaphora Resolution for Unrestricted Texts

pdf bib
Towards resolution of bridging descriptions
Renata Vieira | Simone Teufel
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics