Towards Arabic Sentence Simplification via Classification and Generative Approaches

Nouran Khallaf; Serge Sharoff; Rasha Soliman

doi:10.18653/v1/2022.wanlp-1.5

Towards Arabic Sentence Simplification via Classification and Generative Approaches

Nouran Khallaf, Serge Sharoff, Rasha Soliman

Abstract

This paper presents an attempt to build a Modern Standard Arabic (MSA) sentence-level simplification system. We experimented with sentence simplification using two approaches: (i) a classification approach leading to lexical simplification pipelines which use Arabic-BERT, a pre-trained contextualised model, as well as a model of fastText word embeddings; and (ii) a generative approach, a Seq2Seq technique by applying a multilingual Text-to-Text Transfer Transformer mT5. We developed our training corpus by aligning the original and simplified sentences from the internationally acclaimed Arabic novel Saaq al-Bambuu. We evaluate effectiveness of these methods by comparing the generated simple sentences to the target simple sentences using the BERTScore evaluation metric. The simple sentences produced by the mT5 model achieve P 0.72, R 0.68 and F-1 0.70 via BERTScore, while, combining Arabic-BERT and fastText achieves P 0.97, R 0.97 and F-1 0.97. In addition, we report a manual error analysis for these experiments.

Anthology ID:: 2022.wanlp-1.5
Volume:: Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP)
Month:: December
Year:: 2022
Address:: Abu Dhabi, United Arab Emirates (Hybrid)
Editors:: Houda Bouamor, Hend Al-Khalifa, Kareem Darwish, Owen Rambow, Fethi Bougares, Ahmed Abdelali, Nadi Tomeh, Salam Khalifa, Wajdi Zaghouani
Venue:: WANLP
SIG:: SIGARAB
Publisher:: Association for Computational Linguistics
Note:
Pages:: 43–52
Language:
URL:: https://aclanthology.org/2022.wanlp-1.5/
DOI:: 10.18653/v1/2022.wanlp-1.5
Bibkey:
Cite (ACL):: Nouran Khallaf, Serge Sharoff, and Rasha Soliman. 2022. Towards Arabic Sentence Simplification via Classification and Generative Approaches. In Proceedings of the Seventh Arabic Natural Language Processing Workshop (WANLP), pages 43–52, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):: Towards Arabic Sentence Simplification via Classification and Generative Approaches (Khallaf et al., WANLP 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.wanlp-1.5.pdf

PDF Cite Search Fix data