Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References Tianyi Tang author Hongyuan Lu author Yuchen Jiang author Haoyang Huang author Dongdong Zhang author Xin Zhao author Tom Kocmi author Furu Wei author 2024-06 text Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) Kevin Duh editor Helena Gomez editor Steven Bethard editor Association for Computational Linguistics Mexico City, Mexico conference publication tang-etal-2024-metrics 10.18653/v1/2024.naacl-long.367 https://aclanthology.org/2024.naacl-long.367/ 2024-06 6596 6610