Shironaam: Bengali News Headline Generation using Auxiliary Information

Abu Ubaida Akash, Mir Tafseer Nayeem, Faisal Tareque Shohan, Tanvir Islam


Abstract
Automatic headline generation systems have the potential to assist editors in finding interesting headlines to attract visitors or readers. However, the performance of headline generation systems remains challenging due to the unavailability of sufficient parallel data for low-resource languages like Bengali and the lack of ideal approaches to develop a system for headline generation using pre-trained language models, especially for long news articles. To address these challenges, we present Shironaam, a large-scale dataset in Bengali containing over 240K news article-headline pairings with auxiliary data such as image captions, topic words, and category information. Unlike other headline generation models, this paper uses this auxiliary information to better model this task. Furthermore, we utilize the contextualized language models to design encoder-decoder model for Bengali news headline generation and follow a simple yet cost-effective coarse-to-fine approach using topic-words to retrieve important sentences considering the fixed length requirement of the pre-trained language models. Finally, we conduct extensive experiments on our dataset containing news articles of 13 different categories to demonstrate the effectiveness of incorporating auxiliary information and evaluate our system on a wide range of metrics. The experimental results demonstrate that our methods bring significant improvements (i.e., 3 to 10 percentage points across all evaluation metrics) over the baselines. Also to illustrate the utility and robustness, we report experimental results in few-shot and non-few-shot settings.
Anthology ID:
2023.eacl-main.4
Volume:
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Andreas Vlachos, Isabelle Augenstein
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
52–67
Language:
URL:
https://aclanthology.org/2023.eacl-main.4
DOI:
10.18653/v1/2023.eacl-main.4
Bibkey:
Cite (ACL):
Abu Ubaida Akash, Mir Tafseer Nayeem, Faisal Tareque Shohan, and Tanvir Islam. 2023. Shironaam: Bengali News Headline Generation using Auxiliary Information. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 52–67, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Shironaam: Bengali News Headline Generation using Auxiliary Information (Akash et al., EACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.eacl-main.4.pdf
Video:
 https://aclanthology.org/2023.eacl-main.4.mp4