Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings

Abdul Moeed; Yang An; Gerhard Hagerer; Georg Groh

Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings

Abdul Moeed, Yang An, Gerhard Hagerer, Georg Groh

Abstract

With the explosive growth in textual data, it is becoming increasingly important to summarize text automatically. Recently, generative language models have shown promise in abstractive text summarization tasks. Since these models rephrase text and thus use similar but different words as found in the summarized text, existing metrics such as ROUGE that use n-gram overlap may not be optimal. Therefore we evaluate two embedding-based evaluation metrics that are applicable to abstractive summarization: Fr ́echet embedding distance, which has been introduced recently, and angular embedding similarity, which is our proposed metric. To demonstrate the utility of both metrics, we analyze the headline generation capacity of two state-of-the-art language models: GPT-2 and ULMFiT. In particular, our proposed metric shows close relation with human judgments in our experiments and has overall better correlations with them. To provide reproducibility, the source code plus human assessments of our experiments is available on GitHub.

Anthology ID:: 2020.lrec-1.222
Volume:: Proceedings of the Twelfth Language Resources and Evaluation Conference
Month:: May
Year:: 2020
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1796–1802
Language:: English
URL:: https://aclanthology.org/2020.lrec-1.222/
DOI:
Bibkey:
Cite (ACL):: Abdul Moeed, Yang An, Gerhard Hagerer, and Georg Groh. 2020. Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1796–1802, Marseille, France. European Language Resources Association.
Cite (Informal):: Evaluation Metrics for Headline Generation Using Deep Pre-Trained Embeddings (Moeed et al., LREC 2020)
Copy Citation:
PDF:: https://aclanthology.org/2020.lrec-1.222.pdf

PDF Cite Search Fix data