BITS Pilani at HinglishEval: Quality Evaluation for Code-Mixed Hinglish Text Using Transformers

Shaz Furniturewala, Vijay Kumari, Amulya Ratna Dash, Hriday Kedia, Yashvardhan Sharma


Abstract
Code-Mixed text data consists of sentences having words or phrases from more than one language. Most multi-lingual communities worldwide communicate using multiple languages, with English usually one of them. Hinglish is a Code-Mixed text composed of Hindi and English but written in Roman script. This paper aims to determine the factors influencing the quality of Code-Mixed text data generated by the system. For the HinglishEval task, the proposed model uses multilingual BERT to find the similarity between synthetically generated and human-generated sentences to predict the quality of synthetically generated Hinglish sentences.
Anthology ID:
2022.inlg-genchal.6
Volume:
Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges
Month:
July
Year:
2022
Address:
Waterville, Maine, USA and virtual meeting
Editors:
Samira Shaikh, Thiago Ferreira, Amanda Stent
Venue:
INLG
SIG:
SIGGEN
Publisher:
Association for Computational Linguistics
Note:
Pages:
35–38
Language:
URL:
https://aclanthology.org/2022.inlg-genchal.6
DOI:
Bibkey:
Cite (ACL):
Shaz Furniturewala, Vijay Kumari, Amulya Ratna Dash, Hriday Kedia, and Yashvardhan Sharma. 2022. BITS Pilani at HinglishEval: Quality Evaluation for Code-Mixed Hinglish Text Using Transformers. In Proceedings of the 15th International Conference on Natural Language Generation: Generation Challenges, pages 35–38, Waterville, Maine, USA and virtual meeting. Association for Computational Linguistics.
Cite (Informal):
BITS Pilani at HinglishEval: Quality Evaluation for Code-Mixed Hinglish Text Using Transformers (Furniturewala et al., INLG 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.inlg-genchal.6.pdf