Developing a Monolingual Sentence Simplification Corpus for Urdu

Yusra Anees, Sadaf Abdul Rauf, Nauman Iqbal, Abdul Basit Siddiqi


Abstract
Complex sentences are a hurdle in the learning process of language learners. Sentence simplification aims to convert a complex sentence into its simpler form such that it is easily comprehensible. To build such automated simplification systems, corpora of complex sentences and their simplified versions is the first step to understand sentence complexity and enable the development of automatic text simplification systems. No such corpus has yet been developed for Urdu and we fill this gap by developing one such corpus to help start readability and automatic sentence simplification research. We present a lexical and syntactically simplified Urdu simplification corpus and a detailed analysis of the various simplification operations. We further analyze our corpora using text readability measures and present a comparison of the original, lexical simplified, and syntactically simplified corpora.
Anthology ID:
2020.winlp-1.23
Volume:
Proceedings of the Fourth Widening Natural Language Processing Workshop
Month:
July
Year:
2020
Address:
Seattle, USA
Editors:
Rossana Cunha, Samira Shaikh, Erika Varis, Ryan Georgi, Alicia Tsai, Antonios Anastasopoulos, Khyathi Raghavi Chandu
Venue:
WiNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
92–95
Language:
URL:
https://aclanthology.org/2020.winlp-1.23
DOI:
10.18653/v1/2020.winlp-1.23
Bibkey:
Cite (ACL):
Yusra Anees, Sadaf Abdul Rauf, Nauman Iqbal, and Abdul Basit Siddiqi. 2020. Developing a Monolingual Sentence Simplification Corpus for Urdu. In Proceedings of the Fourth Widening Natural Language Processing Workshop, pages 92–95, Seattle, USA. Association for Computational Linguistics.
Cite (Informal):
Developing a Monolingual Sentence Simplification Corpus for Urdu (Anees et al., WiNLP 2020)
Copy Citation:
Video:
 http://slideslive.com/38929561