AVA: an Automatic eValuation Approach for Question Answering Systems

Thuy Vu, Alessandro Moschitti


Abstract
We introduce AVA, an automatic evaluation approach for Question Answering, which given a set of questions associated with Gold Standard answers (references), can estimate system Accuracy. AVA uses Transformer-based language models to encode question, answer, and reference texts. This allows for effectively assessing answer correctness using similarity between the reference and an automatic answer, biased towards the question semantics. To design, train, and test AVA, we built multiple large training, development, and test sets on public and industrial benchmarks. Our innovative solutions achieve up to 74.7% F1 score in predicting human judgment for single answers. Additionally, AVA can be used to evaluate the overall system Accuracy with an error lower than 7% at 95% of confidence when measured on several QA systems.
Anthology ID:
2021.naacl-main.412
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5223–5233
Language:
URL:
https://aclanthology.org/2021.naacl-main.412
DOI:
10.18653/v1/2021.naacl-main.412
Bibkey:
Cite (ACL):
Thuy Vu and Alessandro Moschitti. 2021. AVA: an Automatic eValuation Approach for Question Answering Systems. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5223–5233, Online. Association for Computational Linguistics.
Cite (Informal):
AVA: an Automatic eValuation Approach for Question Answering Systems (Vu & Moschitti, NAACL 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.naacl-main.412.pdf
Video:
 https://aclanthology.org/2021.naacl-main.412.mp4