Low-Resource Text Style Transfer for Bangla: Data & Models

Sourabrata Mukherjee, Akanksha Bansal, Pritha Majumdar, Atul Kr. Ojha, Ondřej Dušek


Abstract
Text style transfer (TST) involves modifying the linguistic style of a given text while retaining its core content. This paper addresses the challenging task of text style transfer in the Bangla language, which is low-resourced in this area. We present a novel Bangla dataset that facilitates text sentiment transfer, a subtask of TST, enabling the transformation of positive sentiment sentences to negative and vice versa. To establish a high-quality base for further research, we refined and corrected an existing English dataset of 1,000 sentences for sentiment transfer based on Yelp reviews, and we introduce a new human-translated Bangla dataset that parallels its English counterpart. Furthermore, we offer multiple benchmark models that serve as a validation of the dataset and baseline for further research.
Anthology ID:
2023.banglalp-1.5
Volume:
Proceedings of the First Workshop on Bangla Language Processing (BLP-2023)
Month:
December
Year:
2023
Address:
Singapore
Editors:
Firoj Alam, Sudipta Kar, Shammur Absar Chowdhury, Farig Sadeque, Ruhul Amin
Venue:
BanglaLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–47
Language:
URL:
https://aclanthology.org/2023.banglalp-1.5
DOI:
10.18653/v1/2023.banglalp-1.5
Bibkey:
Cite (ACL):
Sourabrata Mukherjee, Akanksha Bansal, Pritha Majumdar, Atul Kr. Ojha, and Ondřej Dušek. 2023. Low-Resource Text Style Transfer for Bangla: Data & Models. In Proceedings of the First Workshop on Bangla Language Processing (BLP-2023), pages 34–47, Singapore. Association for Computational Linguistics.
Cite (Informal):
Low-Resource Text Style Transfer for Bangla: Data & Models (Mukherjee et al., BanglaLP 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.banglalp-1.5.pdf