Shao Min Tan
2024
Towards an Automated Pointwise Evaluation Metric for Generated Long-Form Legal Summaries
Shao Min Tan
|
Quentin Grail
|
Lee Quartey
Proceedings of the Natural Legal Language Processing Workshop 2024
Long-form abstractive summarization is a task that has particular importance in the legal domain. Automated evaluation metrics are important for the development of text generation models, but existing research on the evaluation of generated summaries has focused mainly on short summaries. We introduce an automated evaluation methodology for generated long-form legal summaries, which involves breaking each summary into individual points, comparing the points in a human-written and machine-generated summary, and calculating a recall and precision score for the latter. The method is designed to be particularly suited for the complexities of legal text, and is also fully interpretable. We also create and release a small meta-dataset for the benchmarking of evaluation methods, focusing on long-form legal summarization. Our evaluation metric corresponds better with human evaluation compared to existing metrics which were not developed for legal data.