Referenceless Parsing-Based Evaluation of AMR-to-English Generation

Emma Manning, Nathan Schneider


Abstract
Reference-based automatic evaluation metrics are notoriously limited for NLG due to their inability to fully capture the range of possible outputs. We examine a referenceless alternative: evaluating the adequacy of English sentences generated from Abstract Meaning Representation (AMR) graphs by parsing into AMR and comparing the parse directly to the input. We find that the errors introduced by automatic AMR parsing substantially limit the effectiveness of this approach, but a manual editing study indicates that as parsing improves, parsing-based evaluation has the potential to outperform most reference-based metrics.
Anthology ID:
2021.eval4nlp-1.12
Volume:
Proceedings of the 2nd Workshop on Evaluation and Comparison of NLP Systems
Month:
November
Year:
2021
Address:
Punta Cana, Dominican Republic
Venues:
EMNLP | Eval4NLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
114–122
Language:
URL:
https://aclanthology.org/2021.eval4nlp-1.12
DOI:
10.18653/v1/2021.eval4nlp-1.12
Bibkey:
Copy Citation:
PDF:
https://aclanthology.org/2021.eval4nlp-1.12.pdf