Pingchuan Ma
2024
Split and Merge: Aligning Position Biases in LLM-based Evaluators
Zongjie Li
|
Chaozheng Wang
|
Pingchuan Ma
|
Daoyuan Wu
|
Shuai Wang
|
Cuiyun Gao
|
Yang Liu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Large language models (LLMs) have shown promise as automated evaluators for assessing the quality of answers generated by AI systems. However, LLM-based evaluators exhibit position bias, or inconsistency, when used to evaluate candidate answers in pairwise comparisons, favoring either the first or second answer regardless of content. To address this limitation, we propose PORTIA, an alignment-based system designed to mimic human comparison strategies to calibrate position bias in a lightweight yet effective manner. Specifically, PORTIA splits the answers into multiple segments, taking into account both length and semantics, and merges them back into a single prompt for evaluation by LLMs. Extensive experiments with six LLMs on 11,520 answer pairs demonstrate that PORTIA markedly enhances the consistency rates for all models and forms of comparison tested, achieving an average relative improvement of 47.46%. It also enables PORTIA-enhanced GPT-3.5 to achieve agreement rates with humans comparable to GPT-4 and elevates GPT-4’s consistency rate up to 98%. Subsequent human evaluations indicate that the PORTIA-enhanced GPT-3.5 model can even surpass standalone GPT-4 in terms of alignment with human evaluators, highlighting PORTIA’s ability to correct position bias, improve LLM consistency, and boost performance while keeping cost efficiency.
2023
InsightPilot: An LLM-Empowered Automated Data Exploration System
Pingchuan Ma
|
Rui Ding
|
Shuai Wang
|
Shi Han
|
Dongmei Zhang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Exploring data is crucial in data analysis, as it helps users understand and interpret the data more effectively. However, performing effective data exploration requires in-depth knowledge of the dataset, the user intent and expertise in data analysis techniques. Not being familiar with either can create obstacles that make the process time-consuming and overwhelming. To address this issue, we introduce InsightPilot, an LLM (Large Language Model)-based, automated data exploration system designed to simplify the data exploration process. InsightPilot features a set of carefully designed analysis actions that streamline the data exploration process. Given a natural language question, InsightPilot collaborates with the LLM to issue a sequence of analysis actions, explore the data and generate insights. We demonstrate the effectiveness of InsightPilot in a user study and a case study, showing how it can help users gain valuable insights from their datasets.
Search
Co-authors
- Shuai Wang 2
- Zongjie Li 1
- Chaozheng Wang 1
- Daoyuan Wu 1
- Cuiyun Gao 1
- show all...