Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences

Evelina Rennes


Abstract
Parallel monolingual resources are imperative for data-driven sentence simplification research. We present the work of aligning, at the sentence level, a corpus of all Swedish public authorities and municipalities web texts in standard and simple Swedish. We compare the performance of three alignment algorithms used for similar work in English (Average Alignment, Maximum Alignment, and Hungarian Alignment), and the best-performing algorithm is used to create a resource of 15,433 unique sentence pairs. We evaluate the resulting corpus using a set of features that has proven to predict text complexity of Swedish texts. The results show that the sentences of the simple sub-corpus are indeed less complex than the sentences of the standard part of the corpus, according to many of the text complexity measures.
Anthology ID:
2020.readi-1.2
Volume:
Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI)
Month:
May
Year:
2020
Address:
Marseille, France
Editors:
Núria Gala, Rodrigo Wilkens
Venue:
READI
SIG:
Publisher:
European Language Resources Association
Note:
Pages:
6–13
Language:
English
URL:
https://aclanthology.org/2020.readi-1.2
DOI:
Bibkey:
Cite (ACL):
Evelina Rennes. 2020. Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences. In Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI), pages 6–13, Marseille, France. European Language Resources Association.
Cite (Informal):
Is it simpler? An Evaluation of an Aligned Corpus of Standard-Simple Sentences (Rennes, READI 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.readi-1.2.pdf