Divide-Conquer-Reasoning for Consistency Evaluation and Automatic Improvement of Large Language Models

Wendi Cui, Zhuohang Li, Damien Lopez, Kamalika Das, Bradley A. Malin, Sricharan Kumar, Jiaxin Zhang


Abstract
Evaluating the quality and consistency of text generated by Large Language Models (LLMs) poses a significant, yet unresolved challenge for industry research. We propose , an automated framework for evaluating and improving the consistency of LLM-generated texts using a divide-conquer-reasoning approach. Unlike existing LLM-based evaluators operating at the paragraph level, our method employs a divide-and-conquer evaluator () that breaks down the paragraph-to-paragraph comparison into sentence-to-paragraph comparisons. To facilitate this approach, we also introduce an automatic metric converter () that translates the output from into an interpretable numeric score. Beyond the consistency evaluation, we further present a reason-assisted improver () that mitigates inconsistencies by leveraging the analytical reasons identified by . Through comprehensive and systematic empirical analysis, we show that our approach outperforms state-of-the-art methods by a large margin (e.g., +16.8% and +32.5% on the SummEval dataset) in consistency evaluation across multiple benchmarks. Our approach also substantially reduces nearly 90% output inconsistencies in one iteration, showing promise for effective hallucination mitigation in real-world industrial applications.
Anthology ID:
2024.emnlp-industry.25
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
Month:
November
Year:
2024
Address:
Miami, Florida, US
Editors:
Franck Dernoncourt, Daniel Preoţiuc-Pietro, Anastasia Shimorina
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
334–361
Language:
URL:
https://aclanthology.org/2024.emnlp-industry.25
DOI:
Bibkey:
Cite (ACL):
Wendi Cui, Zhuohang Li, Damien Lopez, Kamalika Das, Bradley A. Malin, Sricharan Kumar, and Jiaxin Zhang. 2024. Divide-Conquer-Reasoning for Consistency Evaluation and Automatic Improvement of Large Language Models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 334–361, Miami, Florida, US. Association for Computational Linguistics.
Cite (Informal):
Divide-Conquer-Reasoning for Consistency Evaluation and Automatic Improvement of Large Language Models (Cui et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-industry.25.pdf