Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text Simplification

Daniel Holmer, Evelina Rennes


Abstract
Automatic text simplification (ATS) describes the automatic transformation of a text from a complex form to a less complex form. Many modern ATS techniques need large parallel corpora of standard and simplified text, but such data does not exist for many languages. One way to overcome this issue is to create pseudo-parallel corpora by dividing existing corpora into standard and simple parts. In this work, we explore the creation of Swedish pseudo-parallel monolingual corpora by the application of different feature representation methods, sentence alignment algorithms, and indexing approaches, on a large monolingual corpus. The different corpora are used to fine-tune a sentence simplification system based on BART, which is evaluated with standard evaluation metrics for automatic text simplification.
Anthology ID:
2023.nodalida-1.13
Volume:
Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
Month:
May
Year:
2023
Address:
Tórshavn, Faroe Islands
Editors:
Tanel Alumäe, Mark Fishel
Venue:
NoDaLiDa
SIG:
Publisher:
University of Tartu Library
Note:
Pages:
113–123
Language:
URL:
https://aclanthology.org/2023.nodalida-1.13
DOI:
Bibkey:
Cite (ACL):
Daniel Holmer and Evelina Rennes. 2023. Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text Simplification. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 113–123, Tórshavn, Faroe Islands. University of Tartu Library.
Cite (Informal):
Constructing Pseudo-parallel Swedish Sentence Corpora for Automatic Text Simplification (Holmer & Rennes, NoDaLiDa 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.nodalida-1.13.pdf