2025
pdf
bib
abs
ML-Promise: A Multilingual Dataset for Corporate Promise Verification
Yohei Seki
|
Hakusen Shu
|
Anaïs Lhuissier
|
Hanwool Lee
|
Juyeon Kang
|
Min-Yuh Day
|
Chung-Chi Chen
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Promises made by politicians, corporate leaders, and public figures have a significant impact on public perception, trust, and institutional reputation. However, the complexity and volume of such commitments, coupled with difficulties in verifying their fulfillment, necessitate innovative methods for assessing their credibility. This paper introduces the concept of Promise Verification, a systematic approach involving steps such as promise identification, evidence assessment, and the evaluation of timing for verification. We propose the first multilingual dataset, ML-Promise, which includes English, French, Chinese, Japanese, and Korean, aimed at facilitating in-depth verification of promises, particularly in the context of Environmental, Social, and Governance (ESG) reports. Given the growing emphasis on corporate environmental contributions, this dataset addresses the challenge of evaluating corporate promises, especially in light of practices like greenwashing. Our findings also explore textual and image-based baselines, with promising results from retrieval-augmented generation (RAG) approaches. This work aims to foster further discourse on the accountability of public commitments across multiple languages and domains.
pdf
bib
abs
Multilingual Promise Verification in ESG Reports with Large Language Model Performance Evaluation
Wei-Chen Huang
|
Hsin-Ting Lu
|
Wen-Ze Chen
|
Min-Yuh Day
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Corporate ESG reports often contain statements that are vague or difficult to verify, creating room for potential greenwashing. Building automated systems to evaluate such claims is therefore a relevant research direction. Yet, existing analytical tools still show limited ability to verify sustainability promises in multiple languages, especially beyond English. This study examines how large language models (GPT-5) perform in verifying ESG-related promises across Chinese, Japanese, and English reports, aiming to provide a multilingual evaluation baseline. We assess four verification tasks using the PromiseEval datasets [1] in three languages, comparing five prompting strategies from zero-shot to five-shot learning, including Chain-of-Thought reasoning. The four subtasks are Promise Identification (PI), Evidence Status Assessment (ESA), Evidence Quality Evaluation (EQE), and Verification Timeline Prediction (VTP). The five-shot setting achieved the highest overall performance (71.12 % accuracy, 51.92 % Macro-F1). Although the accuracy results appear higher for Chinese (85.12 %) than for Japanese (68.94 %) and English (63.62 %), this mainly reflects class imbalance in the data. Hence, Macro-F1 provides a fairer comparison across languages. Among the four tasks, Evidence Quality Evaluation (EQE) remains the most difficult. While Chain-of-Thought prompting slightly lowers the overall average, it shows selective benefit on the more complex EQE task. Overall, this work offers a clearer multilingual baseline for ESG promise verification and supports the development of language-based tools that enhance the credibility and transparency of sustainability reporting.
pdf
bib
abs
SemEval-2025 Task 6: Multinational, Multilingual, Multi-Industry Promise Verification
Chung-Chi Chen
|
Yohei Seki
|
Hakusen Shu
|
Anaïs Lhuissier
|
Juyeon Kang
|
Hanwool Lee
|
Min-Yuh Day
|
Hiroya Takamura
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)
While extensive research exists on misinformation and disinformation, there is limited focus on future-oriented commitments, such as corporate ESG promises, which are often difficult to verify yet significantly impact public trust and market stability. To address this gap, we introduce the task of promise verification, leveraging natural language processing (NLP) techniques to automatically detect ESG commitments, identify supporting evidence, and evaluate the consistency between promises and evidence, while also inferring potential verification time points. This paper presents the dataset used in SemEval-2025 PromiseEval, outlines participant solutions, and discusses key findings. The goal is to enhance transparency in corporate discourse, strengthen investor trust, and support regulators in monitoring the fulfillment of corporate commitments.
2024
pdf
bib
abs
Multi-Lingual ESG Impact Duration Inference
Chung-Chi Chen
|
Yu-Min Tseng
|
Juyeon Kang
|
Anais Lhuissier
|
Yohei Seki
|
Hanwool Lee
|
Min-Yuh Day
|
Teng-Tsai Tu
|
Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
To accurately assess the dynamic impact of a company’s activities on its Environmental, Social, and Governance (ESG) scores, we have initiated a series of shared tasks, named ML-ESG. These tasks adhere to the MSCI guidelines for annotating news articles across various languages. This paper details the third iteration of our series, ML-ESG-3, with a focus on impact duration inference—a task that poses significant challenges in estimating the enduring influence of events, even for human analysts. In ML-ESG-3, we provide datasets in five languages (Chinese, English, French, Korean, and Japanese) and share insights from our experience in compiling such subjective datasets. Additionally, this paper reviews the methodologies proposed by ML-ESG-3 participants and offers a comparative analysis of the models’ performances. Concluding the paper, we introduce the concept for the forthcoming series of shared tasks, namely multi-lingual ESG promise verification, and discuss its potential contributions to the field.
pdf
bib
abs
IMNTPU at ML-ESG-3: Transformer Language Models for Multi-Lingual ESG Impact Type and Duration Classification
Yu Han Kao
|
Vidhya Nataraj
|
Ting-Chi Wang
|
Yu-Jyun Zheng
|
Hsiao-Chuan Liu
|
Wen-Hsuan Liao
|
Chia-Tung Tsai
|
Min-Yuh Day
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
Our team participated in the multi-lingual Environmental, Social, and Governance (ESG) classification task, focusing on datasets in three languages: English, French, and Japanese. This study leverages Pre-trained Language Models (PLMs), with a particular emphasis on the Bidirectional Encoder Representations from Transformers (BERT) framework, to analyze sentence and document structures across these varied linguistic datasets. The team’s experimentation with diverse PLM-based network designs facilitated a nuanced comparative analysis within this multi-lingual context. For each language-specific dataset, different BERT-based transformer models were trained and evaluated. Notably, in the experimental results, the RoBERTa-Base model emerged as the most effective in official evaluation, particularly in the English dataset, achieving a micro-F1 score of 58.82 %, thereby demonstrating superior performance in classifying ESG impact levels. This research highlights the adaptability and effectiveness of PLMs in tackling the complexities of multi-lingual ESG classification tasks, underscoring the exceptional performance of the Roberta Base model in processing English-language data.
2023
pdf
bib
Multi-Lingual ESG Issue Identification
Chung-Chi Chen
|
Yu-Min Tseng
|
Juyeon Kang
|
Anaïs Lhuissier
|
Min-Yuh Day
|
Teng-Tsai Tu
|
Hsin-Hsi Chen
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting
pdf
bib
abs
Multi-Lingual ESG Impact Type Identification
Chung-Chi Chen
|
Yu-Min Tseng
|
Juyeon Kang
|
Anaïs Lhuissier
|
Yohei Seki
|
Min-Yuh Day
|
Teng-Tsai Tu
|
Hsin-Hsi Chen
Proceedings of the Sixth Workshop on Financial Technology and Natural Language Processing
Assessing a company’s sustainable development goes beyond just financial metrics; the inclusion of environmental, social, and governance (ESG) factors is becoming increasingly vital. The ML-ESG shared task series seeks to pioneer discussions on news-driven ESG ratings, drawing inspiration from the MSCI ESG rating guidelines. In its second edition, ML-ESG-2 emphasizes impact type identification, offering datasets in four languages: Chinese, English, French, and Japanese. Of the 28 teams registered, 8 participated in the official evaluation. This paper presents a comprehensive overview of ML-ESG-2, detailing the dataset specifics and summarizing the performance outcomes of the participating teams.
2019
pdf
bib
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)
Chen-Yu Chiag
|
Min-Yuh Day
|
Jen-Tzung Chien
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)
pdf
bib
植基於深度學習假新聞人工智慧偵測:台灣真實資料實作(Deep Learning Based Fake News AI Detection:Evidence From Taiwan News Report)
Chih-Chien Wang
|
Min-Yuh Day
|
Lin-Lung Hu
Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019)
2018
pdf
bib
International Journal of Computational Linguistics & Chinese Language Processing, Volume 23, Number 2, December 2018
Chen-Yu Chiang
|
Min-Yuh Day
International Journal of Computational Linguistics & Chinese Language Processing, Volume 23, Number 2, December 2018