Referenceless Parsing-Based Evaluation of AMR-to-English Generation

Emma Manning, Nathan Schneider


Abstract
Reference-based automatic evaluation metrics are notoriously limited for NLG due to their inability to fully capture the range of possible outputs. We examine a referenceless alternative: evaluating the adequacy of English sentences generated from Abstract Meaning Representation (AMR) graphs by parsing into AMR and comparing the parse directly to the input. We find that the errors introduced by automatic AMR parsing substantially limit the effectiveness of this approach, but a manual editing study indicates that as parsing improves, parsing-based evaluation has the potential to outperform most reference-based metrics.
Anthology ID:
2021.eval4nlp-1.12
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Editors:
Yang Gao, Steffen Eger, Wei Zhao, Piyawat Lertvittayakumjorn, Marina Fomicheva
Venue:
Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
114–122
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.12
DOI:
10.18653/v1/2021.eval4nlp-1.12
Bibkey:
Cite (ACL):
Emma Manning and Nathan Schneider. 2021. Referenceless Parsing-Based Evaluation of AMR-to-English Generation. In Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems, pages 114–122, Punta Cana, Dominican Republic. Association for Computational Linguistics.
Cite (Informal):
Referenceless Parsing-Based Evaluation of AMR-to-English Generation (Manning & Schneider, Eval4NLP 2021)
Copy Citation:
PDF:
https://aclanthology.org/2021.eval4nlp-1.12.pdf
Video:
 https://aclanthology.org/2021.eval4nlp-1.12.mp4