What’s in a Name? Answer Equivalence For Open-Domain Question Answering

Chenglei Si, Chen Zhao, Jordan Boyd-Graber


Abstract
A flaw in QA evaluation is that annotations often only provide one gold answer. Thus, model predictions semantically equivalent to the answer but superficially different are considered incorrect. This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers). We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural Questions, TriviaQA, and SQuAD. Answer expansion increases the exact match score on all datasets for evaluation, while incorporating it helps model training over real-world datasets. We ensure the additional answers are valid through a human post hoc evaluation.
Anthology ID:
2021.emnlp-main.757
Volume:
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2021
Address:
Online and Punta Cana, Dominican Republic
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9623–9629
Language:
URL:
https://aclanthology.org/2021.emnlp-main.757
DOI:
10.18653/v1/2021.emnlp-main.757
Bibkey:
Cite (ACL):
Chenglei Si, Chen Zhao, and Jordan Boyd-Graber. 2021. What’s in a Name? Answer Equivalence For Open-Domain Question Answering. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9623–9629, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
What’s in a Name? Answer Equivalence For Open-Domain Question Answering (Si et al., EMNLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.emnlp-main.757.pdf
Video:
 https://aclanthology.org/2021.emnlp-main.757.mp4
Code
 noviscl/answerequiv
Data
Natural QuestionsSQuADTriviaQA