%0 Conference Proceedings
%T Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!
%A Kann, Katharina
%A Rothe, Sascha
%A Filippova, Katja
%Y Korhonen, Anna
%Y Titov, Ivan
%S Proceedings of the 22nd Conference on Computational Natural Language Learning
%D 2018
%8 October
%I Association for Computational Linguistics
%C Brussels, Belgium
%F kann-etal-2018-sentence
%X Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level. We further introduce WPSLOR, a novel WordPiece-based version, which harnesses a more compact language model. Even though word-overlap metrics like ROUGE are computed with the help of hand-written references, our referenceless methods obtain a significantly higher correlation with human fluency scores on a benchmark dataset of compressed sentences. Finally, we present ROUGE-LM, a reference-based metric which is a natural extension of WPSLOR to the case of available references. We show that ROUGE-LM yields a significantly higher correlation with human judgments than all baseline metrics, including WPSLOR on its own.
%R 10.18653/v1/K18-1031
%U https://aclanthology.org/K18-1031
%U https://doi.org/10.18653/v1/K18-1031
%P 313-323