Interview Evaluation: A Novel Approach for Automatic Evaluation of Conversational Question Answering Models

Xibo Li, Bowei Zou, Yifan Fan, Yanling Li, Ai Ti Aw, Yu Hong


Abstract
Conversational Question Answering (CQA) aims to provide natural language answers to users in information-seeking dialogues. Existing CQA benchmarks often evaluate models using pre-collected human-human conversations. However, replacing the model-predicted dialogue history with ground truth compromises the naturalness and sustainability of CQA evaluation. While previous studies proposed using predicted history and rewriting techniques to address unresolved coreferences and incoherencies, this approach renders the question self-contained from the conversation. In this paper, we propose a novel automatic evaluation approach, interview evaluation. Specifically, ChatGPT acts as the interviewer (Q agent) with a set of carefully designed prompts, and the CQA model under test serves as the interviewee (A agent). During the interview evaluation, questions are dynamically generated by the Q agent to guide the A agent in predicting the correct answer through an interactive process. We evaluated four different models on QuAC and two models on CoQA in our experiments. The experiment results demonstrate that our interview evaluation has advantages over previous CQA evaluation approaches, particularly in terms of naturalness and coherence. The source code is made publicly available.
Anthology ID:
2023.emnlp-main.209
Volume:
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Month:
December
Year:
2023
Address:
Singapore
Editors:
Houda Bouamor, Juan Pino, Kalika Bali
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3435–3446
Language:
URL:
https://aclanthology.org/2023.emnlp-main.209
DOI:
10.18653/v1/2023.emnlp-main.209
Bibkey:
Cite (ACL):
Xibo Li, Bowei Zou, Yifan Fan, Yanling Li, Ai Ti Aw, and Yu Hong. 2023. Interview Evaluation: A Novel Approach for Automatic Evaluation of Conversational Question Answering Models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3435–3446, Singapore. Association for Computational Linguistics.
Cite (Informal):
Interview Evaluation: A Novel Approach for Automatic Evaluation of Conversational Question Answering Models (Li et al., EMNLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.emnlp-main.209.pdf
Video:
 https://aclanthology.org/2023.emnlp-main.209.mp4