RNSum: A Large-Scale Dataset for Automatic Release Note Generation via Commit Logs Summarization

Hisashi Kamezawa, Noriki Nishida, Nobuyuki Shimizu, Takashi Miyazaki, Hideki Nakayama


Abstract
A release note is a technical document that describes the latest changes to a software product and is crucial in open source software development. However, it still remains challenging to generate release notes automatically. In this paper, we present a new dataset called RNSum, which contains approximately 82,000 English release notes and the associated commit messages derived from the online repositories in GitHub. Then, we propose classwise extractive-then-abstractive/abstractive summarization approaches to this task, which can employ a modern transformer-based seq2seq network like BART and can be applied to various repositories without specific constraints. The experimental results on the RNSum dataset show that the proposed methods can generate less noisy release notes at higher coverage than the baselines. We also observe that there is a significant gap in the coverage of essential information when compared to human references. Our dataset and the code are publicly available.
Anthology ID:
2022.acl-long.597
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8718–8735
Language:
URL:
https://aclanthology.org/2022.acl-long.597
DOI:
10.18653/v1/2022.acl-long.597
Bibkey:
Cite (ACL):
Hisashi Kamezawa, Noriki Nishida, Nobuyuki Shimizu, Takashi Miyazaki, and Hideki Nakayama. 2022. RNSum: A Large-Scale Dataset for Automatic Release Note Generation via Commit Logs Summarization. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8718–8735, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
RNSum: A Large-Scale Dataset for Automatic Release Note Generation via Commit Logs Summarization (Kamezawa et al., ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.597.pdf
Video:
 https://aclanthology.org/2022.acl-long.597.mp4