Multilingual Promise Verification in ESG Reports with Large Language Model Performance Evaluation

Wei-Chen Huang; Hsin-Ting Lu; Wen-Ze Chen; Min-Yuh Day

Multilingual Promise Verification in ESG Reports with Large Language Model Performance Evaluation

Wei-Chen Huang, Hsin-Ting Lu, Wen-Ze Chen, Min-Yuh Day

Abstract

Corporate ESG reports often contain statements that are vague or difficult to verify, creating room for potential greenwashing. Building automated systems to evaluate such claims is therefore a relevant research direction. Yet, existing analytical tools still show limited ability to verify sustainability promises in multiple languages, especially beyond English. This study examines how large language models (GPT-5) perform in verifying ESG-related promises across Chinese, Japanese, and English reports, aiming to provide a multilingual evaluation baseline. We assess four verification tasks using the PromiseEval datasets [1] in three languages, comparing five prompting strategies from zero-shot to five-shot learning, including Chain-of-Thought reasoning. The four subtasks are Promise Identification (PI), Evidence Status Assessment (ESA), Evidence Quality Evaluation (EQE), and Verification Timeline Prediction (VTP). The five-shot setting achieved the highest overall performance (71.12 % accuracy, 51.92 % Macro-F1). Although the accuracy results appear higher for Chinese (85.12 %) than for Japanese (68.94 %) and English (63.62 %), this mainly reflects class imbalance in the data. Hence, Macro-F1 provides a fairer comparison across languages. Among the four tasks, Evidence Quality Evaluation (EQE) remains the most difficult. While Chain-of-Thought prompting slightly lowers the overall average, it shows selective benefit on the more complex EQE task. Overall, this work offers a clearer multilingual baseline for ESG promise verification and supports the development of language-based tools that enhance the credibility and transparency of sustainability reporting.

Anthology ID:: 2025.rocling-main.32
Volume:: Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Month:: November
Year:: 2025
Address:: National Taiwan University, Taipei City, Taiwan
Editors:: Kai-Wei Chang, Ke-Han Lu, Chih-Kai Yang, Zhi-Rui Tam, Wen-Yu Chang, Chung-Che Wang
Venue:: ROCLING
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 303–313
Language:
URL:: https://aclanthology.org/2025.rocling-main.32/
DOI:
Bibkey:
Cite (ACL):: Wei-Chen Huang, Hsin-Ting Lu, Wen-Ze Chen, and Min-Yuh Day. 2025. Multilingual Promise Verification in ESG Reports with Large Language Model Performance Evaluation. In Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025), pages 303–313, National Taiwan University, Taipei City, Taiwan. Association for Computational Linguistics.
Cite (Informal):: Multilingual Promise Verification in ESG Reports with Large Language Model Performance Evaluation (Huang et al., ROCLING 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.rocling-main.32.pdf

PDF Cite Search Fix data