Maria das Graças Volpe Nunes

Also published as: Maria das Graças V. Nunes, Maria das Graças Volpe Nunes, Maria das Gracas Volpe, Maria das Graças, Maria das Gracas Volpe Nunes, Maria das Graças Nunes, Maria das Graças Volpe Nunes


2025

Enhanced Universal Dependencies (EUD) serve as a crucial link between syntax and semantics. Beyond basic syntactic dependencies, EUD provides valuable refined logical connections for downstream tasks such as semantic role labeling, coreference resolution, information extraction, and question answering. The original EUD framework defines six types of relationships, but this paper introduces an extension designed to address subject propagation in pro-drop languages. This “Extended EUD” proposal increases the number of relationships that may be annotated in sentences, improving linguistic representation. Additionally, we report our experiments on a corpus of Portuguese (a pro-drop language), which we make publicly available to the research community.

2024

2023

2018

2016

Text normalization techniques based on rules, lexicons or supervised training requiring large corpora are not scalable nor domain interchangeable, and this makes them unsuitable for normalizing user-generated content (UGC). Current tools available for Brazilian Portuguese make use of such techniques. In this work we propose a technique based on distributed representation of words (or word embeddings). It generates continuous numeric vectors of high-dimensionality to represent words. The vectors explicitly encode many linguistic regularities and patterns, as well as syntactic and semantic word relationships. Words that share semantic similarity are represented by similar vectors. Based on these features, we present a totally unsupervised, expandable and language and domain independent method for learning normalization lexicons from word embeddings. Our approach obtains high correction rate of orthographic errors and internet slang in product reviews, outperforming the current available tools for Brazilian Portuguese.

2015

2014

Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.

2010

2008

Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980´s. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and inter-relating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important initiative, relying on a slightly different method of building multilingual wordnets, is the MultiWordNet project, where the key strategy is building language specific wordnets keeping as much as possible of the semantic relations available in the WN.Pr. This paper, in particular, stresses that the additional advantage of using WN.Pr lexical database as a resource for building wordnets for other languages is to explore possibilities of implementing an automatic procedure to map the WN.Pr conceptual relations as hyponymy, co-hyponymy, troponymy, meronymy, cause, and entailment onto the lexical database of the wordnet under construction, a viable possibility, for those are language-independent relations that hold between lexicalized concepts, not between lexical units. Accordingly, combining methods from both initiatives, this paper presents the ongoing implementation of the WN.Br lexical database and the aforementioned automation procedure illustrated with a sample of the automatic encoding of the hyponymy and co-hyponymy relations.

2007

2006

2002

2001

2000

1992