Livy Real
2026
Towards a Universal Dependencies Corpus for Portuguese Epidemiological Reports
Christian Freitas | Livy Real | Lilian Berton | Valeria de Paiva
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Christian Freitas | Livy Real | Lilian Berton | Valeria de Paiva
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
We present an ongoing research project focused on the construction of a Universal Dependencies (UD) corpus of Portuguese epidemiological reports derived from documents published within the Brazilian public health system. We describe findings and challenges to build such a corpus from PDF reports processed through a controlled document extraction pipeline that contrasts layout-aware extraction with raw PDF text extraction, explicitly addressing the impact of tabular content on downstream syntactic analysis. Narrative text is annotated using multiple UD parsers for Portuguese, including widely used and state-of-the-art tools, and their outputs are systematically compared using descriptive structural indicators and targeted qualitative inspection. Our analysis highlights domain-specific challenges in epidemiological texts and shows that document extraction and representation choices have a stronger effect on parsing behavior than parser selection alone. Based on these findings, we identify robust preprocessing configurations and discuss design choices for a UD-epidemiological corpus to support future research on syntactic parsing, domain adaptation, and downstream natural language processing tasks in epidemiology and public health.
Textual Inference in Portuguese: Comparing Language Models
Fabiana Avais | Valeria de Paiva | Livy Real
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Fabiana Avais | Valeria de Paiva | Livy Real
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Large language models (LLMs) are increasingly used for Natural Language Inference (NLI), yet their ability to perform logic-sensitive semantic reasoning, especially outside English, remains underexplored. This paper presents a preliminary investigation into the feasibility and usefulness of developing FraCaS-BR, a Portuguese adaptation of the FraCaS benchmark for semantic inference. Using a small diagnostic subset of seven FraCaS problems focusing on generalized quantifiers, plurals, and nominal anaphora, we evaluate the behavior of three LLMs (ChatGPT, Maritalk, and Evaristo) on Brazilian Portuguese translations. Each problem is submitted multiple times to assess correctness, variance, and consistency relative to the original FraCaS gold labels. The results reveal systematic differences across models.While ChatGPT shows higher overall correctness and stability, all models exhibit limitations that undermine their reliability on logic-controlled inference tasks. The extent of manual correction required during translation further underscores the necessity of human-in-the-loop evaluation. Taken together, these findings support and motivate the development of FraCaS-BR as a controlled evaluation resource for assessing semantic reasoning in Portuguese.
2024
RePro: A Benchmark Dataset for Opinion Mining in Brazilian Portuguese
Lucas Nildaimon dos Santos Silva | Ana Cláudia Zandavalle | Carolina Francisco Gadelha Rodrigues | Tatiana da Silva Gama | Fernando Guedes Souza | Phillipe Derwich Silva Zaidan | Alice Florencio Severino da Silva | Karina Soares | Livy Real
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Lucas Nildaimon dos Santos Silva | Ana Cláudia Zandavalle | Carolina Francisco Gadelha Rodrigues | Tatiana da Silva Gama | Fernando Guedes Souza | Phillipe Derwich Silva Zaidan | Alice Florencio Severino da Silva | Karina Soares | Livy Real
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2
Pablo Gamallo | Daniela Claro | António Teixeira | Livy Real | Marcos Garcia | Hugo Gonçalo Oliveira | Raquel Amaro
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2
Pablo Gamallo | Daniela Claro | António Teixeira | Livy Real | Marcos Garcia | Hugo Gonçalo Oliveira | Raquel Amaro
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 2
Brazilian Portuguese Product Reviews Moderation with AutoML
Lucas Nildaimon dos Santos Silva | Livy Real | Fernando Rezende Zagatti | Ana Claudia Bianchini Zandavalle | Tatiana da Silva Gama | Carolina Francisco Gadelha Rodrigues
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Lucas Nildaimon dos Santos Silva | Livy Real | Fernando Rezende Zagatti | Ana Claudia Bianchini Zandavalle | Tatiana da Silva Gama | Carolina Francisco Gadelha Rodrigues
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Pablo Gamallo | Daniela Claro | António Teixeira | Livy Real | Marcos Garcia | Hugo Gonçalo Oliveira | Raquel Amaro
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Pablo Gamallo | Daniela Claro | António Teixeira | Livy Real | Marcos Garcia | Hugo Gonçalo Oliveira | Raquel Amaro
Proceedings of the 16th International Conference on Computational Processing of Portuguese - Vol. 1
Automated Topic Annotation in Brazilian Product Reviews: A Case Study of Adversarial Examples with Sabia-3
Lucas Nildaimon dos Santos Silva | Livy Real
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Lucas Nildaimon dos Santos Silva | Livy Real
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Getting Logic From LLMs Annotating Natural Language Inference with Sabiá
Fabiana Avais | Marcos Carreira | Livy Real
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
Fabiana Avais | Marcos Carreira | Livy Real
Proceedings of the 15th Brazilian Symposium in Information and Human Language Technology
2022
mwetoolkit-lib: Adaptation of the mwetoolkit as a Python Library and an Application to MWE-based Document Clustering
Fernando Zagatti | Paulo Augusto de Lima Medeiros | Esther da Cunha Soares | Lucas Nildaimon dos Santos Silva | Carlos Ramisch | Livy Real
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Fernando Zagatti | Paulo Augusto de Lima Medeiros | Esther da Cunha Soares | Lucas Nildaimon dos Santos Silva | Carlos Ramisch | Livy Real
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
This paper introduces the mwetoolkit-lib, an adaptation of the mwetoolkit as a python library. The original toolkit performs the extraction and identification of multiword expressions (MWEs) in large text bases through the command line. One of the contributions of our work is the adaptation of the MWE extraction pipeline from the mwetoolkit, allowing its usage in python development environments and integration in larger pipelines. The other contribution is the execution of a pilot experiment aiming to show the impact of MWE discovery in data professionals’ work. This experiment found that the addition of MWE knowledge to the Term Frequency-Inverse Document Frequency (TF-IDF) vectorization altered the word relevance order, improving the linguistic quality of the clusters returned by k-means method.
2021
Relation extraction in structured and unstructured data: a comparative investigation on smartphone titles in the e-commerce domain
João Barbirato | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
João Barbirato | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Classificação multimodal para detecão de produtos proibidos em uma plataforma marketplace
Alan Romualdo | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Alan Romualdo | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Annotation Difficulties in Natural Language Inference
Aikaterini-Lida Kalouli | Livy Real | Annebeth Buis | Martha Palmer | Valeria Paiva
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Aikaterini-Lida Kalouli | Livy Real | Annebeth Buis | Martha Palmer | Valeria Paiva
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Measuring Brazilian Portuguese Product Titles Similarity using Embeddings
Alan Romualdo | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
Alan Romualdo | Livy Real | Helena Caseli
Proceedings of the 13th Brazilian Symposium in Information and Human Language Technology
2019
Explaining Simple Natural Language Inference
Aikaterini-Lida Kalouli | Annebeth Buis | Livy Real | Martha Palmer | Valeria de Paiva
Proceedings of the 13th Linguistic Annotation Workshop
Aikaterini-Lida Kalouli | Annebeth Buis | Livy Real | Martha Palmer | Valeria de Paiva
Proceedings of the 13th Linguistic Annotation Workshop
The vast amount of research introducing new corpora and techniques for semi-automatically annotating corpora shows the important role that datasets play in today’s research, especially in the machine learning community. This rapid development raises concerns about the quality of the datasets created and consequently of the models trained, as recently discussed with respect to the Natural Language Inference (NLI) task. In this work we conduct an annotation experiment based on a small subset of the SICK corpus. The experiment reveals several problems in the annotation guidelines, and various challenges of the NLI task itself. Our quantitative evaluation of the experiment allows us to assign our empirical observations to specific linguistic phenomena and leads us to recommendations for future annotation tasks, for NLI and possibly for other tasks.
2017
Textual Inference: getting logic from humans
Aikaterini-Lida Kalouli | Livy Real | Valeria de Paiva
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers
Aikaterini-Lida Kalouli | Livy Real | Valeria de Paiva
Proceedings of the 12th International Conference on Computational Semantics (IWCS) — Short papers
Universal Dependencies for Portuguese
Alexandre Rademaker | Fabricio Chalub | Livy Real | Cláudia Freitas | Eckhard Bick | Valeria de Paiva
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)
Alexandre Rademaker | Fabricio Chalub | Livy Real | Cláudia Freitas | Eckhard Bick | Valeria de Paiva
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)
Correcting Contradictions
Aikaterini-Lida Kalouli | Valeria de Paiva | Livy Real
Proceedings of the Computing Natural Language Inference Workshop
Aikaterini-Lida Kalouli | Valeria de Paiva | Livy Real
Proceedings of the Computing Natural Language Inference Workshop
2016
Semantic Links for Portuguese
Fabricio Chalub | Livy Real | Alexandre Rademaker | Valeria de Paiva
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Fabricio Chalub | Livy Real | Alexandre Rademaker | Valeria de Paiva
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This paper describes work on incorporating Princenton’s WordNet morphosemantics links to the fabric of the Portuguese OpenWordNet-PT. Morphosemantic links are relations between verbs and derivationally related nouns that are semantically typed (such as for tune-tuner ― in Portuguese “afinar-afinador” – linked through an “agent” link). Morphosemantic links have been discussed for Princeton’s WordNet for a while, but have not been added to the official database. These links are very useful, they help us to improve our Portuguese WordNet. Thus we discuss the integration of these links in our base and the issues we encountered with the integration.
An overview of Portuguese WordNets
Valeria de Paiva | Livy Real | Hugo Gonçalo Oliveira | Alexandre Rademaker | Cláudia Freitas | Alberto Simões
Proceedings of the 8th Global WordNet Conference (GWC)
Valeria de Paiva | Livy Real | Hugo Gonçalo Oliveira | Alexandre Rademaker | Cláudia Freitas | Alberto Simões
Proceedings of the 8th Global WordNet Conference (GWC)
Semantic relations between words are key to building systems that aim to understand and manipulate language. For English, the “de facto” standard for representing this kind of knowledge is Princeton’s WordNet. Here, we describe the wordnet-like resources currently available for Portuguese: their origins, methods of creation, sizes, and usage restrictions. We start tackling the problem of comparing them, but only in quantitative terms. Finally, we sketch ideas for potential collaboration between some of the projects that produce Portuguese wordnets.
2015
Anotação de corpus com a OpenWordNet-PT: um exercício de desambiguação (Sense annotation with OpenWordNet-PT: an exercise of word sense disambiguation)
Cláudia Freitas | Livy Real | Alexandre Rademaker
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Cláudia Freitas | Livy Real | Alexandre Rademaker
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Seeing is Correcting: curating lexical resources using social interfaces
Livy Real | Fabricio Chalub | Valeria de Paiva | Claudia Freitas | Alexandre Rademaker
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications
Livy Real | Fabricio Chalub | Valeria de Paiva | Claudia Freitas | Alexandre Rademaker
Proceedings of the 4th Workshop on Linked Data in Linguistics: Resources and Applications
HAREM and Klue: how to put two tagsets for named entities annotation together
Livy Real | Alexandre Rademaker
Proceedings of the Fifth Named Entity Workshop
Livy Real | Alexandre Rademaker
Proceedings of the Fifth Named Entity Workshop
2014
OpenWordNet-PT: A Project Report
Alexandre Rademaker | Valeria de Paiva | Gerard de Melo | Livy Real | Maira Gatti
Proceedings of the Seventh Global Wordnet Conference
Alexandre Rademaker | Valeria de Paiva | Gerard de Melo | Livy Real | Maira Gatti
Proceedings of the Seventh Global Wordnet Conference
NomLex-PT: A Lexicon of Portuguese Nominalizations
Valeria de Paiva | Livy Real | Alexandre Rademaker | Gerard de Melo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Valeria de Paiva | Livy Real | Alexandre Rademaker | Gerard de Melo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
This paper presents NomLex-PT, a lexical resource describing Portuguese nominalizations. NomLex-PT connects verbs to their nominalizations, thereby enabling NLP systems to observe the potential semantic relationships between the two words when analysing a text. NomLex-PT is freely available and encoded in RDF for easy integration with other resources. Most notably, we have integrated NomLex-PT with OpenWordNet-PT, an open Portuguese Wordnet.
Search
Fix author
Co-authors
- Valeria de Paiva 11
- Alexandre Rademaker 8
- Cláudia Freitas 4
- Aikaterini-Lida Kalouli 4
- Lucas Nildaimon dos Santos Silva 4
- Helena de Medeiros Caseli 3
- Fabricio Chalub 3
- Hugo Gonçalo Oliveira 3
- Raquel Amaro 2
- Fabiana Avais 2
- Annebeth Buis 2
- Daniela Claro 2
- Gerard De Melo 2
- Pablo Gamallo 2
- Marcos Garcia 2
- Martha Palmer 2
- Carolina Francisco Gadelha Rodrigues 2
- Alan Romualdo 2
- António Teixeira 2
- Tatiana da Silva Gama 2
- João Barbirato 1
- Lilian Berton 1
- Eckhard Bick 1
- Marcos Carreira 1
- Christian Freitas 1
- Maíra Gatti 1
- Paulo Augusto de Lima Medeiros 1
- Valeria Paiva 1
- Carlos Ramisch 1
- Alberto Simões 1
- Karina Soares 1
- Esther da Cunha Soares 1
- Fernando Guedes Souza 1
- Fernando Zagatti 1
- Fernando Rezende Zagatti 1
- Phillipe Derwich Silva Zaidan 1
- Ana Cláudia Zandavalle 1
- Ana Claudia Bianchini Zandavalle 1
- Alice Florencio Severino da Silva 1