Lukasz Golab


2022

pdf bib
GRS: Combining Generation and Revision in Unsupervised Sentence Simplification
Mohammad Dehghan | Dhruv Kumar | Lukasz Golab
Findings of the Association for Computational Linguistics: ACL 2022

We propose GRS: an unsupervised approach to sentence simplification that combines text generation and text revision. We start with an iterative framework in which an input sentence is revised using explicit edit operations, and add paraphrasing as a new edit operation. This allows us to combine the advantages of generative and revision-based approaches: paraphrasing captures complex edit operations, and the use of explicit edit operations in an iterative manner provides controllability and interpretability. We demonstrate these advantages of GRS compared to existing methods on the Newsela and ASSET datasets.

2020

pdf bib
Iterative Edit-Based Unsupervised Sentence Simplification
Dhruv Kumar | Lili Mou | Lukasz Golab | Olga Vechtomova
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a novel iterative, edit-based approach to unsupervised sentence simplification. Our model is guided by a scoring function involving fluency, simplicity, and meaning preservation. Then, we iteratively perform word and phrase-level edits on the complex sentence. Compared with previous approaches, our model does not require a parallel training set, but is more controllable and interpretable. Experiments on Newsela and WikiLarge datasets show that our approach is nearly as effective as state-of-the-art supervised approaches.

2019

pdf bib
Online abuse detection: the value of preprocessing and neural attention models
Dhruv Kumar | Robin Cohen | Lukasz Golab
Proceedings of the Tenth Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

We propose an attention-based neural network approach to detect abusive speech in online social networks. Our approach enables more effective modeling of context and the semantic relationships between words. We also empirically evaluate the value of text pre-processing techniques in addressing the challenge of out-of-vocabulary words in toxic content. Finally, we conduct extensive experiments on the Wikipedia Talk page datasets, showing improved predictive power over the previous state-of-the-art.