Damien Lopez

2024

pdf bib abs
Divide-Conquer-Reasoning for Consistency Evaluation and Automatic Improvement of Large Language Models
Wendi Cui | Zhuohang Li | Damien Lopez | Kamalika Das | Bradley A. Malin | Sricharan Kumar | Jiaxin Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track

Evaluating the quality and consistency of text generated by Large Language Models (LLMs) poses a significant, yet unresolved challenge for industry research. We propose , an automated framework for evaluating and improving the consistency of LLM-generated texts using a divide-conquer-reasoning approach. Unlike existing LLM-based evaluators operating at the paragraph level, our method employs a divide-and-conquer evaluator () that breaks down the paragraph-to-paragraph comparison into sentence-to-paragraph comparisons. To facilitate this approach, we also introduce an automatic metric converter () that translates the output from into an interpretable numeric score. Beyond the consistency evaluation, we further present a reason-assisted improver () that mitigates inconsistencies by leveraging the analytical reasons identified by . Through comprehensive and systematic empirical analysis, we show that our approach outperforms state-of-the-art methods by a large margin (e.g., +16.8% and +32.5% on the SummEval dataset) in consistency evaluation across multiple benchmarks. Our approach also substantially reduces nearly 90% output inconsistencies in one iteration, showing promise for effective hallucination mitigation in real-world industrial applications.

Co-authors

Jiaxin Zhang 1

Venues

emnlp1

Fix data