NumHG: A Dataset for Number-Focused Headline Generation

Jian-Tao Huang; Chung-Chi Chen; Hen-Hsen Huang; Hsin-Hsi Chen

NumHG: A Dataset for Number-Focused Headline Generation

Jian-Tao Huang, Chung-Chi Chen, Hen-Hsen Huang, Hsin-Hsi Chen

Abstract

Headline generation, a key task in abstractive summarization, strives to condense a full-length article into a succinct, single line of text. Notably, while contemporary encoder-decoder models excel based on the ROUGE metric, they often falter when it comes to the precise generation of numerals in headlines. We identify the lack of datasets providing fine-grained annotations for accurate numeral generation as a major roadblock. To address this, we introduce a new dataset, the NumHG, and provide over 27,000 annotated numeral-rich news articles for detailed investigation. Further, we evaluate five well-performing models from previous headline-generation tasks using human evaluation in terms of numerical accuracy, reasonableness, and readability. Our study reveals a need for improvement in numerical accuracy, demonstrating the potential of the NumHG dataset to drive progress in number-focused headline generation and stimulate further discussions in numeral-focused text generation.

Anthology ID:: 2024.lrec-main.1078
Volume:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Month:: May
Year:: 2024
Address:: Torino, Italia
Editors:: Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
Venues:: LREC | COLING
SIG:
Publisher:: ELRA and ICCL
Note:
Pages:: 12323–12329
Language:
URL:: https://aclanthology.org/2024.lrec-main.1078
DOI:
Bibkey:
Cite (ACL):: Jian-Tao Huang, Chung-Chi Chen, Hen-Hsen Huang, and Hsin-Hsi Chen. 2024. NumHG: A Dataset for Number-Focused Headline Generation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 12323–12329, Torino, Italia. ELRA and ICCL.
Cite (Informal):: NumHG: A Dataset for Number-Focused Headline Generation (Huang et al., LREC-COLING 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.lrec-main.1078.pdf

PDF Cite Search