Carlos Henrique Santos Barros


2026

Automatic pun detection remains challenging because it depends on lexical ambiguity and contextual interaction, which are not explicitly captured by linear text representations. In Portuguese, TF-IDF-based ensemble methods provide competitive and interpretable baselines, but remain limited by surface-level features. This work investigates whether corpus-based graph information can complement such methods. Three graph representations are constructed from the Puntuguese corpus: a Co-occurrence graph, a PPMI-weighted graph, and a Pun-Context graph. In the current pipeline, each graph is converted into low-dimensional node embeddings with TruncatedSVD, which are then aggregated into document-level features and concatenated with TF-IDF representations in a soft-voting ensemble. Experimental results on the test set show that graph-based enrichment does not uniformly improve performance: Pun-Context and PPMI yield the strongest graph-augmented results, whereas combining all graphs degrades performance. These findings indicate that the usefulness of graph-based information depends strongly on how lexical relations are encoded and aggregated at the document level.
Supervised models trained on community-labeled data have shown promise in Health Question Answering (HQA), but relying on “likes” as a proxy for clinical usefulness remains controversial. This work investigates the alignment between automated predictions and human perception in Portuguese HQA. Using a subset of the SaudeBR-QA corpus, we compare a Random Forest classifier against a controlled evaluation conducted by laypeople and healthcare professionals. Our results reveal a recurring divergence that we term Superficiality Bias: human evaluators frequently validate very brief answers, whereas the classifier often labels these cases as non-useful under its learned criteria. Rather than indicating that the model is inherently more clinically accurate, this pattern suggests a misalignment between community feedback and feature-driven utility judgments. We argue that crowd-based labels in medical domains should be treated cautiously and complemented with more rigorous annotation protocols.