Towards an Automated Pointwise Evaluation Metric for Generated Long-Form Legal Summaries

Shao Min Tan, Quentin Grail, Lee Quartey


Abstract
Long-form abstractive summarization is a task that has particular importance in the legal domain. Automated evaluation metrics are important for the development of text generation models, but existing research on the evaluation of generated summaries has focused mainly on short summaries. We introduce an automated evaluation methodology for generated long-form legal summaries, which involves breaking each summary into individual points, comparing the points in a human-written and machine-generated summary, and calculating a recall and precision score for the latter. The method is designed to be particularly suited for the complexities of legal text, and is also fully interpretable. We also create and release a small meta-dataset for the benchmarking of evaluation methods, focusing on long-form legal summarization. Our evaluation metric corresponds better with human evaluation compared to existing metrics which were not developed for legal data.
Anthology ID:
2024.nllp-1.10
Volume:
Proceedings of the Natural Legal Language Processing Workshop 2024
Month:
November
Year:
2024
Address:
Miami, FL, USA
Editors:
Nikolaos Aletras, Ilias Chalkidis, Leslie Barrett, Cătălina Goanță, Daniel Preoțiuc-Pietro, Gerasimos Spanakis
Venue:
NLLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
129–142
Language:
URL:
https://aclanthology.org/2024.nllp-1.10
DOI:
Bibkey:
Cite (ACL):
Shao Min Tan, Quentin Grail, and Lee Quartey. 2024. Towards an Automated Pointwise Evaluation Metric for Generated Long-Form Legal Summaries. In Proceedings of the Natural Legal Language Processing Workshop 2024, pages 129–142, Miami, FL, USA. Association for Computational Linguistics.
Cite (Informal):
Towards an Automated Pointwise Evaluation Metric for Generated Long-Form Legal Summaries (Tan et al., NLLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.nllp-1.10.pdf