Thiago A. S. Pardo
2026
Enhanced Universal Dependencies in the Wild: Evaluating Portuguese EUD Parsing in Realistic Scenarios
Elvis A. de Souza | Thiago A. S. Pardo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Elvis A. de Souza | Thiago A. S. Pardo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Enhanced Universal Dependencies (EUD) provide a more informative syntactic representation than Basic Universal Dependencies by relaxing tree constraints to allow for graph structures. While conversion rules from basic to enhanced relations have been established for Portuguese, they were previously evaluated only on journalistic text using gold-standard basic syntactic trees. This paper evaluates the robustness of these rules in diverse scenarios ("in the wild"), encompassing other text genres and domains, as well as realistic parsing pipelines that rely on automatically generated basic syntax. Our results demonstrate that Portuguese-specific rules consistently outperform universal rules. However, the reliance on automatic basic syntax significantly impacts performance. This degradation is particularly severe when the domain of the input text differs from the training data of the basic parser. We also provide a detailed error analysis, identifying specific difficult linguistic phenomena and scenarios.
Socially Responsible and Explainable Automated Fact-Checking and Hate Speech Detection
Francielle Vargas | Fabrício Benevenuto | Thiago A. S. Pardo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
Francielle Vargas | Fabrício Benevenuto | Thiago A. S. Pardo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 2
This Ph.D. dissertation advances the state-of-the-art in Natural Language Processing (NLP) for Portuguese by proposing new and innovative data resources and explainable methods for hate speech detection and automated fact-checking. The thesis introduces several benchmark datasets for Brazilian Portuguese, HateBR, HateBRXplain, HateBRMoralXplain, MFTCXplain, MOL, and FactNews, which have been widely adopted by the research community and address critical gaps in the availability of high-quality annotated resources for Portuguese. In addition, this dissertation proposes novel post-hoc and self-explaining NLP methods: Sentence-Level Factual Reasoning (SELFAR), Social Stereotype Analysis (SSA), Contextual Bag-of-Words with Interpretable Input and Feature Optimization (B+M), Supervised Rational Attention (SRA), and Supervised Moral Rational Attention (SMRA). Across multiple tasks and datasets in Portuguese, these methods outperform baselines while improving interpretability and robustness, demonstrating that explainability and performance can be jointly optimized. Finally, this thesis has achieved significant national and international impact, being cited by leading universities and research institutes worldwide and fostering new M.Sc. and Ph.D. research projects in Brazil. Its scientific and social contributions have also been recognized with multiple prestigious national and international awards, including the Google LARA, the Maria Carolina Monard Best Thesis Award in Artificial Intelligence, the Trevisan Prize for Students “AI for Good” from Bocconi University for rigorous computer science research in AI with social impact, and the Diversity and Inclusion Award from the Association for Computational Linguistics (ACL). Lastly, this thesis has received two nominations for the Brazilian Computer Society Thesis Awards in Computer Science, and in Multimedia, Hypermedia, and Web.
A UD Parser to the Rescue: A Method for Bringing a Classical Annotated Corpus to Life Again
Lucelene Lopes | Magali S. Duran | Thiago A. S. Pardo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Lucelene Lopes | Magali S. Duran | Thiago A. S. Pardo
Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
This paper reports on an effort to recover the classical morphosyntactically annotated corpus MacMorpho and realign it with the current version of the Universal Dependencies framework. We introduce a knowledge-rich approach grounded in a syntactic parser and on a specially designed tagset compatibility strategy in order to generate a "silver-standard" resource: the MacMorpho-UD-2.17. We evaluate this resource through multiple complementary methods, providing evidence for the quality of both our approach and the resulting annotation.
2025
Extending the Enhanced Universal Dependencies – addressing subjects in pro-drop languages
Magali Sanches Duran | Elvis A. de Souza | Maria das Graças Volpe Nunes | Adriana Silvina Pagano | Thiago A. S. Pardo
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)
Magali Sanches Duran | Elvis A. de Souza | Maria das Graças Volpe Nunes | Adriana Silvina Pagano | Thiago A. S. Pardo
Proceedings of the Eighth Workshop on Universal Dependencies (UDW, SyntaxFest 2025)
Enhanced Universal Dependencies (EUD) serve as a crucial link between syntax and semantics. Beyond basic syntactic dependencies, EUD provides valuable refined logical connections for downstream tasks such as semantic role labeling, coreference resolution, information extraction, and question answering. The original EUD framework defines six types of relationships, but this paper introduces an extension designed to address subject propagation in pro-drop languages. This “Extended EUD” proposal increases the number of relationships that may be annotated in sentences, improving linguistic representation. Additionally, we report our experiments on a corpus of Portuguese (a pro-drop language), which we make publicly available to the research community.
2024
Improving Explainable Fact-Checking via Sentence-Level Factual Reasoning
Francielle Vargas | Isadora Salles | Diego Alves | Ameeta Agrawal | Thiago A. S. Pardo | Fabrício Benevenuto
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)
Francielle Vargas | Isadora Salles | Diego Alves | Ameeta Agrawal | Thiago A. S. Pardo | Fabrício Benevenuto
Proceedings of the Seventh Fact Extraction and VERification Workshop (FEVER)
Most existing fact-checking systems are unable to explain their decisions by providing relevant rationales (justifications) for their predictions. It highlights a lack of transparency that poses significant risks, such as the prevalence of unexpected biases, which may increase political polarization due to limitations in impartiality. To address this critical gap, we introduce SEntence-Level FActual Reasoning (SELFAR), aimed at improving explainable fact-checking. SELFAR relies on fact extraction and verification by predicting the news source reliability and factuality (veracity) of news articles or claims at the sentence level, generating post-hoc explanations using SHAP/LIME and zero-shot prompts. Our experiments show that unreliable news stories predominantly consist of subjective statements, in contrast to reliable ones. Consequently, predicting unreliable news articles at the sentence level by analyzing impartiality and subjectivity is a promising approach for fact extraction and improving explainable fact-checking. Furthermore, LIME outperforms SHAP in explaining predictions on reliability. Additionally, while zero-shot prompts provide highly readable explanations and achieve an accuracy of 0.71 in predicting factuality, their tendency to hallucinate remains a challenge. Lastly, this paper also presents the first study on explainable fact-checking in the Portuguese language.
2023
Construções sintaticas do português que desafiam a tarefa de parsing: uma analise qualitativa
Magali S. Duran | Maria das Graças V. Nunes | Thiago A. S. Pardo
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival
Magali S. Duran | Maria das Graças V. Nunes | Thiago A. S. Pardo
Proceedings of the 2nd Edition of the Universal Dependencies Brazilian Festival
2020
NILC at WebNLG+: Pretrained Sequence-to-Sequence Models on RDF-to-Text Generation
Marco Antonio Sobrevilla Cabezudo | Thiago A. S. Pardo
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
Marco Antonio Sobrevilla Cabezudo | Thiago A. S. Pardo
Proceedings of the 3rd International Workshop on Natural Language Generation from the Semantic Web (WebNLG+)
This paper describes the submission by the NILC Computational Linguistics research group of the University of São Paulo/Brazil to the RDF-to-Text task for English at the WebNLG+ challenge. The success of the current pretrained models like BERT or GPT-2 in text-to-text generation tasks is well-known, however, its application/success on data-totext generation has not been well-studied and proven. This way, we explore how good a pretrained model, in particular BART, performs on the data-to-text generation task. The results obtained were worse than the baseline and other systems in almost all automatic measures. However, the human evaluation shows better results for our system. Besides, results suggest that BART may generate paraphrases of reference texts.
2015
On Strategies of Human Multi-Document Summarization
Renata Tironi de Camargo | Ariani Di Felippo | Thiago A. S. Pardo
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Renata Tironi de Camargo | Ariani Di Felippo | Thiago A. S. Pardo
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Enriching entity grids and graphs with discourse relations: the impact in local coherence evaluation
Márcio de S. Dias | Thiago A. S. Pardo
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Márcio de S. Dias | Thiago A. S. Pardo
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Joint semantic discourse models for automatic multi-document summarization
Paula C. Figueira Cardoso | Thiago A. S. Pardo
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
Paula C. Figueira Cardoso | Thiago A. S. Pardo
Proceedings of the 10th Brazilian Symposium in Information and Human Language Technology
2013
Subtopic Annotation in a Corpus of News Texts: Steps Towards Automatic Subtopic Segmentation
Paula C. F. Cardoso | Maite Taboada | Thiago A. S. Pardo
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology
Paula C. F. Cardoso | Maite Taboada | Thiago A. S. Pardo
Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology