Evaluating Reference-Free Summarization Quality Metrics for Portuguese: A Study with Human Judgments in Financial News

João Victor Assaoka Ribeiro; Thomas Pires Correia; José Vitor Souza Cardoso Requena; Lilian Berton

Evaluating Reference-Free Summarization Quality Metrics for Portuguese: A Study with Human Judgments in Financial News

João Victor Assaoka Ribeiro, Thomas Pires Correia, José Vitor Souza Cardoso Requena, Lilian Berton

Abstract

Automatic summarization of financial news in Portuguese lacks reliable reference-free evaluation metrics. While LLM-as-a-Judge approaches are gaining traction, their correlation with human perception in specialized domains remains under-explored. This work evaluates the efficacy of Question Answering (QA) based metrics against a direct LLM-as-a-Judge baseline for Portuguese financial news. We propose a pipeline comparing Lexical, Binary, and Semantic (LLM-based) QA scoring methods, validated against a human ground truth of 50 news items annotated for Faithfulness and Completeness. Our results show that granular QA metrics significantly outperform the monolithic LLM-Judge in evaluating Completeness, with QA-Binary achieving the highest rank correlation (ρ ≈ 0.49 with pessimistic human aggregation). For Faithfulness, we observe a strong ceiling effect in human evaluation, yet the Semantic QA metric demonstrated a "super-human" ability to detect subtle hallucinations (e.g., temporal shifts) missed by annotators. We conclude that decomposing evaluation into atomic QA pairs is superior to holistic judging for the Portuguese financial domain.

Anthology ID:: 2026.propor-1.89
Volume:: Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1
Month:: April
Year:: 2026
Address:: Salvador, Brazil
Editors:: Marlo Souza, Iria de-Dios-Flores, Diana Santos, Larissa Freitas, Jackson Wilke da Cruz Souza, Eugénio Ribeiro
Venue:: PROPOR
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 899–907
Language:
URL:: https://aclanthology.org/2026.propor-1.89/
DOI:
Bibkey:
Cite (ACL):: João Victor Assaoka Ribeiro, Thomas Pires Correia, José Vitor Souza Cardoso Requena, and Lilian Berton. 2026. Evaluating Reference-Free Summarization Quality Metrics for Portuguese: A Study with Human Judgments in Financial News. In Proceedings of the 17th International Conference on Computational Processing of Portuguese (PROPOR 2026) - Vol. 1, pages 899–907, Salvador, Brazil. Association for Computational Linguistics.
Cite (Informal):: Evaluating Reference-Free Summarization Quality Metrics for Portuguese: A Study with Human Judgments in Financial News (Ribeiro et al., PROPOR 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.propor-1.89.pdf

PDF Cite Search Fix data