Kim Cheng Sheang

Also published as: Kim Cheng Sheang


2022

pdf bib
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)
Sanja Štajner | Horacio Saggion | Daniel Ferrés | Matthew Shardlow | Kim Cheng Sheang | Kai North | Marcos Zampieri | Wei Xu
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

pdf bib
Controllable Lexical Simplification for English
Kim Cheng Sheang | Daniel Ferrés | Horacio Saggion
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

Fine-tuning Transformer-based approaches have recently shown exciting results on sentence simplification task. However, so far, no research has applied similar approaches to the Lexical Simplification (LS) task. In this paper, we present ConLS, a Controllable Lexical Simplification system fine-tuned with T5 (a Transformer-based model pre-trained with a BERT-style approach and several other tasks). The evaluation results on three datasets (LexMTurk, BenchLS, and NNSeval) have shown that our model performs comparable to LSBert (the current state-of-the-art) and even outperforms it in some cases. We also conducted a detailed comparison on the effectiveness of control tokens to give a clear view of how each token contributes to the model.

pdf bib
Findings of the TSAR-2022 Shared Task on Multilingual Lexical Simplification
Horacio Saggion | Sanja Štajner | Daniel Ferrés | Kim Cheng Sheang | Matthew Shardlow | Kai North | Marcos Zampieri
Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

We report findings of the TSAR-2022 shared task on multilingual lexical simplification, organized as part of the Workshop on Text Simplification, Accessibility, and Readability TSAR-2022 held in conjunction with EMNLP 2022. The task called the Natural Language Processing research community to contribute with methods to advance the state of the art in multilingual lexical simplification for English, Portuguese, and Spanish. A total of 14 teams submitted the results of their lexical simplification systems for the provided test data. Results of the shared task indicate new benchmarks in Lexical Simplification with English lexical simplification quantitative results noticeably higher than those obtained for Spanish and (Brazilian) Portuguese.

pdf bib
Identification of complex words and passages in medical documents in French
Kim Cheng Sheang | Anaïs Koptient | Natalia Grabar | Horacio Saggion
Actes de la 29e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Identification de mots et passages difficiles dans les documents médicaux en français. L’objectif de la simplification automatique des textes consiste à fournir une nouvelle version de documents qui devient plus facile à comprendre pour une population donnée ou plus facile à traiter par d’autres applications du TAL. Cependant, avant d’effectuer la simplification, il est important de savoir ce qu’il faut simplifier exactement dans les documents. En effet, même dans les documents techniques et spécialisés, il n’est pas nécessaire de tout simplifier mais juste les segments qui présentent des difficultés de compréhension. Il s’agit typiquement de la tâche d’identification de mots complexes : effectuer le diagnostic de difficulté d’un document donné pour y détecter les mots et passages complexes. Nous proposons de travail sur l’identification de mots et passages complexes dans les documents biomédicaux en français.

2021

pdf bib
Controllable Sentence Simplification with a Unified Text-to-Text Transfer Transformer
Kim Cheng Sheang | Horacio Saggion
Proceedings of the 14th International Conference on Natural Language Generation

Recently, a large pre-trained language model called T5 (A Unified Text-to-Text Transfer Transformer) has achieved state-of-the-art performance in many NLP tasks. However, no study has been found using this pre-trained model on Text Simplification. Therefore in this paper, we explore the use of T5 fine-tuning on Text Simplification combining with a controllable mechanism to regulate the system outputs that can help generate adapted text for different target audiences. Our experiments show that our model achieves remarkable results with gains of between +0.69 and +1.41 over the current state-of-the-art (BART+ACCESS). We argue that using a pre-trained model such as T5, trained on several tasks with large amounts of data, can help improve Text Simplification.

2019

pdf bib
Multilingual Complex Word Identification: Convolutional Neural Networks with Morphological and Linguistic Features
Kim Cheng Sheang
Proceedings of the Student Research Workshop Associated with RANLP 2019

The paper is about our experiments with Complex Word Identification system using deep learning approach with word embeddings and engineered features.