Magali Sanches Duran

Also published as: Magali Sanches Duran, Magali Duran

2025

pdf bib abs
The revision of linguistic annotation in the Universal Dependencies framework: a look at the annotators’ behavior
Magali Sanches Duran | Lucelene Lopes | Thiago Alexandre Salgueiro Pardo
Proceedings of the 19th Linguistic Annotation Workshop (LAW-XIX-2025)

This paper presents strategies to revise an automatically annotated corpus according to the Universal Dependencies framework and discusses the learned lessons, mainly regarding the annotators’ behavior. The revision strategies are not relying on examples from any specific language and, because they are languageindependent, can be adopted in any language and corpus annotation initiative.

pdf bib abs
Extending the Enhanced Universal Dependencies – addressing subjects in pro-drop languages
Magali Sanches Duran | Elvis A. de Souza | Maria das Graças Volpe Nunes | Adriana Silvina Pagano | Thiago A. S. Pardo
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)

Enhanced Universal Dependencies (EUD) serve as a crucial link between syntax and semantics. Beyond basic syntactic dependencies, EUD provides valuable refined logical connections for downstream tasks such as semantic role labeling, coreference resolution, information extraction, and question answering. The original EUD framework defines six types of relationships, but this paper introduces an extension designed to address subject propagation in pro-drop languages. This “Extended EUD” proposal increases the number of relationships that may be annotated in sentences, improving linguistic representation. Additionally, we report our experiments on a corpus of Portuguese (a pro-drop language), which we make publicly available to the research community.

This paper presents PortiLexicon-UD, a large and freely available lexicon for Portuguese delivering morphosyntactic information according to the Universal Dependencies model. This lexical resource includes part of speech tags, lemmas, and morphological information for words, with 1,221,218 entries (considering word duplication due to different combination of PoS tag, lemma, and morphological features). We report the lexicon creation process, its computational data structure, and its evaluation over an annotated corpus, showing that it has a high language coverage and good quality data.

2021

pdf bib
On auxiliary verb in Universal Dependencies: untangling the issue and proposing a systematized annotation strategy
Magali Duran | Adriana Pagano | Amanda Rassi | Thiago Pardo
Proceedings of the Sixth International Conference on Dependency Linguistics (Depling, SyntaxFest 2021)

pdf bib
Porttinari - a Large Multi-genre Treebank for Brazilian Portuguese
Thiago Pardo | Magali Duran | Lucelene Lopes | Ariani Di Felippo | Norton Roman | Maria das Graas Nunes
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

pdf bib
Descrião de numerais segundo modelo Universal Dependencies e sua anotação no português
Magali Duran | Lucelene Lopes | Thiago Pardo
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology

2018

pdf bib abs
A Nontrivial Sentence Corpus for the Task of Sentence Readability Assessment in Portuguese
Sidney Evaldo Leal | Magali Sanches Duran | Sandra Maria Aluísio
Proceedings of the 27th International Conference on Computational Linguistics

Effective textual communication depends on readers being proficient enough to comprehend texts, and texts being clear enough to be understood by the intended audience, in a reading task. When the meaning of textual information and instructions is not well conveyed, many losses and damages may occur. Among the solutions to alleviate this problem is the automatic evaluation of sentence readability, task which has been receiving a lot of attention due to its large applicability. However, a shortage of resources, such as corpora for training and evaluation, hinders the full development of this task. In this paper, we generate a nontrivial sentence corpus in Portuguese. We evaluate three scenarios for building it, taking advantage of a parallel corpus of simplification, in which each sentence triplet is aligned and has simplification operations annotated, being ideal for justifying possible mistakes of future methods. The best scenario of our corpus PorSimplesSent is composed of 4,888 pairs, which is bigger than a similar corpus for English; all the three versions of it are publicly available. We created four baselines for PorSimplesSent and made available a pairwise ranking method, using 17 linguistic and psycholinguistic features, which correctly identifies the ranking of sentence pairs with an accuracy of 74.2%.

2015

pdf bib
Automatic Generation of a Lexical Resource to support Semantic Role Labeling in Portuguese
Magali Sanches Duran | Sandra Aluísio
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf bib
A Normalizer for UGC in Brazilian Portuguese
Magali Sanches Duran | Maria das Graças Volpe Nunes | Lucas Avanço
Proceedings of the Workshop on Noisy User-generated Text

2014

pdf bib abs
Generating a Lexicon of Errors in Portuguese to Support an Error Identification System for Spanish Native Learners
Lianet Sepúlveda Torres | Magali Sanches Duran | Sandra Aluísio
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Portuguese is a less resourced language in what concerns foreign language learning. Aiming to inform a module of a system designed to support scientific written production of Spanish native speakers learning Portuguese, we developed an approach to automatically generate a lexicon of wrong words, reproducing language transfer errors made by such foreign learners. Each item of the artificially generated lexicon contains, besides the wrong word, the respective Spanish and Portuguese correct words. The wrong word is used to identify the interlanguage error and the correct Spanish and Portuguese forms are used to generate the suggestions. Keeping control of the correct word forms, we can provide correction or, at least, useful suggestions for the learners. We propose to combine two automatic procedures to obtain the error correction: i) a similarity measure and ii) a translation algorithm based on aligned parallel corpus. The similarity-based method achieved a precision of 52%, whereas the alignment-based method achieved a precision of 90%. In this paper we focus only on interlanguage errors involving suffixes that have different forms in both languages. The approach, however, is very promising to tackle other types of errors, such as gender errors.

pdf bib abs
A Large Corpus of Product Reviews in Portuguese: Tackling Out-Of-Vocabulary Words
Nathan Hartmann | Lucas Avanço | Pedro Balage | Magali Duran | Maria das Graças Volpe Nunes | Thiago Pardo | Sandra Aluísio
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Web 2.0 has allowed a never imagined communication boom. With the widespread use of computational and mobile devices, anyone, in practically any language, may post comments in the web. As such, formal language is not necessarily used. In fact, in these communicative situations, language is marked by the absence of more complex syntactic structures and the presence of internet slang, with missing diacritics, repetitions of vowels, and the use of chat-speak style abbreviations, emoticons and colloquial expressions. Such language use poses severe new challenges for Natural Language Processing (NLP) tools and applications, which, so far, have focused on well-written texts. In this work, we report the construction of a large web corpus of product reviews in Brazilian Portuguese and the analysis of its lexical phenomena, which support the development of a lexical normalization tool for, in future work, subsidizing the use of standard NLP products for web opinion mining and summarization purposes.

pdf bib
Some Issues on the Normalization of a Corpus of Products Reviews in Portuguese
Magali Sanches Duran | Lucas Avanço | Sandra Aluísio | Thiago Pardo | Maria da Graça Volpe Nunes
Proceedings of the 9th Web as Corpus Workshop (WaC-9)

2013

pdf bib
Identifying Pronominal Verbs: Towards Automatic Disambiguation of the Clitic ‘se’ in Portuguese
Magali Sanches Duran | Carolina Evaristo Scarton | Sandra Maria Aluísio | Carlos Ramisch
Proceedings of the 9th Workshop on Multiword Expressions

pdf bib
Um repositório de verbos para a anotação de papéis semânticos disponível na web (A Verb Repository for Semantic Role Labeling Available in the Web) [in Portuguese]
Magali Sanches Duran | Jhonata Pereira Martins | Sandra Maria Aluísio
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology

2012

pdf bib abs
Propbank-Br: a Brazilian Treebank annotated with semantic role labels
Magali Sanches Duran | Sandra Maria Aluísio
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper reports the annotation of a Brazilian Portuguese Treebank with semantic role labels following Propbank guidelines. A different language and a different parser output impact the task and require some decisions on how to annotate the corpus. Therefore, a new annotation guide ― called Propbank-Br - has been generated to deal with specific language phenomena and parser problems. In this phase of the project, the corpus was annotated by a unique linguist. The annotation task reported here is inserted in a larger projet for the Brazilian Portuguese language. This project aims to build Brazilian verbs frames files and a broader and distributed annotation of semantic role labels in Brazilian Portuguese, allowing inter-annotator agreement measures. The corpus, available in web, is already being used to build a semantic tagger for Portuguese language.

2011

pdf bib
Identifying and Analyzing Brazilian Portuguese Complex Predicates
Magali Sanches Duran | Carlos Ramisch | Sandra Maria Aluísio | Aline Villavicencio
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

pdf bib
Propbank-Br: a Brazilian Portuguese corpus annotated with semantic role labels
Magali Sanches Duran | Sandra Maria Aluísio
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

2010

pdf bib abs
Assigning Wh-Questions to Verbal Arguments: Annotation Tools Evaluation and Corpus Building
Magali Sanches Duran | Marcelo Adriano Amâncio | Sandra Maria Aluísio
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This work reports the evaluation and selection of annotation tools to assign wh-question labels to verbal arguments in a sentence. Wh-question assignment discussed herein is a kind of semantic annotation which involves two tasks: making delimitation of verbs and arguments, and linking verbs to its arguments by question labels. As it is a new type of semantic annotation, there is no report about requirements an annotation tool should have to face it. For this reason, we decided to select the most appropriated tool in two phases. In the first phase, we executed the task with an annotation tool we have used before in another task. Such phase helped us to test the task and enabled us to know which features were or not desirable in an annotation tool for our purpose. In the second phase, guided by such requirements, we evaluated several tools and selected a tool for the real task. After corpus annotation conclusion, we report some of the annotation results and some comments on the improvements there should be made in an annotation tool to better support such kind of annotation task.

Venues

law1

udw1

wac1

wnut1

Fix author