Helena Rodrigues Menezes de Oliveira Vaz
Also published as: Helena Rodrigues Menezes de Oliveira Vaz
2026
Que ao mestre vai matá-lo? The evolution of prepositional accusatives in Portuguese across time
Helena Rodrigues Menezes de Oliveira Vaz
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Helena Rodrigues Menezes de Oliveira Vaz
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
This work investigates Differential Object Marking (DOM) in Brazilian Portuguese (BP), specifically a-marked objects, or prepositional accusatives (PP-ACCs), across four variables: semantic requirements, constituent order, verb semantics, and textual genre.An optimized parsing model was trained to recognize instances of PP-ACCs and automatically annotate historical documents for these objects for the Tycho Brahe and Colonia corpora. Contrary to expectations based on the low frequency of these objects and prior diachronic studies on European Portuguese (EP), our results reveal that PP-ACCs remain present in BP from the 18th century onward. Our findings confirm previous patterns for EP and present textual genre (specifically, narrative texts and theater plays) as a possible relevant variable, but warrants further investigation. Constituent order was proved to be less significant than previously suggested. This work also reveals methodological challenges in using computational models and NLP tools for research in historical Portuguese.
2024
A Multilingual Parallel Corpus for Coreference Resolution and Information Status in the Literary Domain
Andrew Dyer | Ruveyda Betul Bahceci | Maryam Rajestari | Andreas Rouvalis | Aarushi Singhal | Yuliya Stodolinska | Syahidah Asma Umniyati | Helena Rodrigues Menezes de Oliveira Vaz
Proceedings of the 22nd Workshop on Treebanks and Linguistic Theories (TLT 2024)
Andrew Dyer | Ruveyda Betul Bahceci | Maryam Rajestari | Andreas Rouvalis | Aarushi Singhal | Yuliya Stodolinska | Syahidah Asma Umniyati | Helena Rodrigues Menezes de Oliveira Vaz
Proceedings of the 22nd Workshop on Treebanks and Linguistic Theories (TLT 2024)
Information status — the newness or givenness of referents in discourse — is known to affect the production of language at many different levels. At the morphosyntactic level, information status gives rise to special words orders, elisions, and other phenomena that challenge the notion that morphosyntax can be considered independent of discourse context. Though there are many language-specific corpora annotated for information status and its related phenomena, coreference and anaphora resolution, what is not available at present is a cross-lingually consistently annotated corpus or annotation scheme that would allow for comparativestudy of these phenomena across many diverse languages. In this paper we present our work to build such a resource. We are annotating a parsed, parallel corpus of prose in many languages for information status and coreference resolution, so that like-for-like cross-lingual comparisons can be made at the intersection of discourse and syntax. Our corpus can and will be used bot