Transfer Learning for Russian Legal Text Simplification

Mark Athugodage, Olga Mitrofanove, Vadim Gudkov


Abstract
We present novel results in legal text simplification for Russian. We introduce the first dataset for such a task in Russian - a parallel corpus based on the data extracted from “Rossiyskaya Gazeta Legal Papers”. In this study we discuss three approaches for text simplification which involve T5 and GPT model architectures. We evaluate the proposed models on a set of metrics: ROUGE, SARI and BERTScore. We also analysed the models’ results on such readability indices as Flesch-Kinkaid Grade Level and Gunning Fog Index. And, finally, we performed human evaluation of simplified texts generated by T5 and GPT models; expertise was carried out by native speakers of Russian and Russian lawyers. In this research we compared T5 and GPT architectures for text simplification task and found out that GPT handles better when it is fine-tuned on dataset of coped texts. Our research makes a big step in improving Russian legal text readability and accessibility for common people.
Anthology ID:
2024.readi-1.6
Volume:
Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rodrigo Wilkens, Rémi Cardon, Amalia Todirascu, Núria Gala
Venues:
READI | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
59–69
Language:
URL:
https://aclanthology.org/2024.readi-1.6
DOI:
Bibkey:
Cite (ACL):
Mark Athugodage, Olga Mitrofanove, and Vadim Gudkov. 2024. Transfer Learning for Russian Legal Text Simplification. In Proceedings of the 3rd Workshop on Tools and Resources for People with REAding DIfficulties (READI) @ LREC-COLING 2024, pages 59–69, Torino, Italia. ELRA and ICCL.
Cite (Informal):
Transfer Learning for Russian Legal Text Simplification (Athugodage et al., READI-WS 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.readi-1.6.pdf