How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation Chia-Wei Liu author Ryan Lowe author Iulian Serban author Mike Noseworthy author Laurent Charlin author Joelle Pineau author 2016-11 text Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing Jian Su editor Kevin Duh editor Xavier Carreras editor Association for Computational Linguistics Austin, Texas conference publication liu-etal-2016-evaluate 10.18653/v1/D16-1230 https://aclanthology.org/D16-1230/ 2016-11 2122 2132