Helena Rodrigues Menezes de Oliveira Vaz

Also published as: Helena Rodrigues Menezes de Oliveira Vaz


2026

This work investigates Differential Object Marking (DOM) in Brazilian Portuguese (BP), specifically a-marked objects, or prepositional accusatives (PP-ACCs), across four variables: semantic requirements, constituent order, verb semantics, and textual genre.An optimized parsing model was trained to recognize instances of PP-ACCs and automatically annotate historical documents for these objects for the Tycho Brahe and Colonia corpora. Contrary to expectations based on the low frequency of these objects and prior diachronic studies on European Portuguese (EP), our results reveal that PP-ACCs remain present in BP from the 18th century onward. Our findings confirm previous patterns for EP and present textual genre (specifically, narrative texts and theater plays) as a possible relevant variable, but warrants further investigation. Constituent order was proved to be less significant than previously suggested. This work also reveals methodological challenges in using computational models and NLP tools for research in historical Portuguese.

2024

Information status — the newness or givenness of referents in discourse — is known to affect the production of language at many different levels. At the morphosyntactic level, information status gives rise to special words orders, elisions, and other phenomena that challenge the notion that morphosyntax can be considered independent of discourse context. Though there are many language-specific corpora annotated for information status and its related phenomena, coreference and anaphora resolution, what is not available at present is a cross-lingually consistently annotated corpus or annotation scheme that would allow for comparativestudy of these phenomena across many diverse languages. In this paper we present our work to build such a resource. We are annotating a parsed, parallel corpus of prose in many languages for information status and coreference resolution, so that like-for-like cross-lingual comparisons can be made at the intersection of discourse and syntax. Our corpus can and will be used bot