CASM - Context and Something More in Lexical Simplification

Atharva Kumbhar; Sheetal Sonawane; Dipali Kadam; Prathamesh Mulay

CASM - Context and Something More in Lexical Simplification

Atharva Kumbhar, Sheetal Sonawane, Dipali Kadam, Prathamesh Mulay

Abstract

Lexical Simplification is a challenging task that aims to improve the readability of text for nonnative people, people with dyslexia, and any linguistic impairments. It consists of 3 components: 1) Complex Word Identification 2) Substitute Generation 3) Substitute Ranking. Current methods use contextual information as a primary source in all three stages of the simplification pipeline. We argue that while context is an important measure, it alone is not sufficient in the process. In the complex word identification step, contextual information is inadequate, moreover, heavy feature engineering is required to use additional linguistic features. This paper presents a novel architecture for complex word identification that uses a pre-trained transformer model’s information flow through its hidden layers as a feature representation that implicitly encodes all the features required for identification. We portray how database methods and masked language modeling can be complementary to one another in substitute generation and ranking process that is built on the foundational pillars of Simplicity, Grammatical and Semantic correctness, and context preservation. We show that our proposed model generalizes well and outperforms the current state-of-the-art on wellknown datasets.

Anthology ID:: 2023.icon-1.46
Volume:: Proceedings of the 20th International Conference on Natural Language Processing (ICON)
Month:: December
Year:: 2023
Address:: Goa University, Goa, India
Editors:: Jyoti D. Pawar, Sobha Lalitha Devi
Venue:: ICON
SIG:: SIGLEX
Publisher:: NLP Association of India (NLPAI)
Note:
Pages:: 506–515
Language:
URL:: https://aclanthology.org/2023.icon-1.46/
DOI:
Bibkey:
Cite (ACL):: Atharva Kumbhar, Sheetal Sonawane, Dipali Kadam, and Prathamesh Mulay. 2023. CASM - Context and Something More in Lexical Simplification. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), pages 506–515, Goa University, Goa, India. NLP Association of India (NLPAI).
Cite (Informal):: CASM - Context and Something More in Lexical Simplification (Kumbhar et al., ICON 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.icon-1.46.pdf

PDF Cite Search Fix data