Summarizing Long Regulatory Documents with a Multi-Step Pipeline

Mika Sie; Ruby Beek; Michiel Bots; Sjaak Brinkkemper; Albert Gatt

Summarizing Long Regulatory Documents with a Multi-Step Pipeline

Mika Sie, Ruby Beek, Michiel Bots, Sjaak Brinkkemper, Albert Gatt

Abstract

Due to their length and complexity, long regulatory texts are challenging to summarize. To address this, a multi-step extractive-abstractive architecture is proposed to handle lengthy regulatory documents more effectively. In this paper, we show that the effectiveness of a two-step architecture for summarizing long regulatory texts varies significantly depending on the model used. Specifically, the two-step architecture improves the performance of decoder-only models. For abstractive encoder-decoder models with short context lengths, the effectiveness of an extractive step varies, whereas for long-context encoder-decoder models, the extractive step worsens their performance. This research also highlights the challenges of evaluating generated texts, as evidenced by the differing results from human and automated evaluations. Most notably, human evaluations favoured language models pretrained on legal text, while automated metrics rank general-purpose language models higher. The results underscore the importance of selecting the appropriate summarization strategy based on model architecture and context length.

Anthology ID:: 2024.nllp-1.2
Volume:: Proceedings of the Natural Legal Language Processing Workshop 2024
Month:: November
Year:: 2024
Address:: Miami, FL, USA
Editors:: Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:: NLLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 18–32
Language:
URL:: https://aclanthology.org/2024.nllp-1.2
DOI:
Bibkey:
Cite (ACL):: Mika Sie, Ruby Beek, Michiel Bots, Sjaak Brinkkemper, and Albert Gatt. 2024. Summarizing Long Regulatory Documents with a Multi-Step Pipeline. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 18–32, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):: Summarizing Long Regulatory Documents with a Multi-Step Pipeline (Sie et al., NLLP 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.nllp-1.2.pdf

PDF Cite Search