Shengyi Jiang


pdf bib
Improving English-Arabic Transliteration with Phonemic Memories
Yuanhe Tian | Renze Lou | Xiangyu Pang | Lianxi Wang | Shengyi Jiang | Yan Song
Findings of the Association for Computational Linguistics: EMNLP 2022

Transliteration is an important task in natural language processing (NLP) which aims to convert a name in the source language to the target language without changing its pronunciation. Particularly, transliteration from English to Arabic is highly needed in many applications, especially in countries (e.g., United Arab Emirates (UAE)) whose most citizens are foreigners but the official language is Arabic. In such a task-oriented scenario, namely transliterating the English names to the corresponding Arabic ones, the performance of the transliteration model is highly important. However, most existing neural approaches mainly apply a universal transliteration model with advanced encoders and decoders to the task, where limited attention is paid to leveraging the phonemic association between English and Arabic to further improve model performance. In this paper, we focus on transliteration of people’s names from English to Arabic for the general public. In doing so, we collect a corpus named EANames by extracting high quality name pairs from online resources which better represent the names in the general public than linked Wikipedia entries that are always names of famous people). We propose a model for English-Arabic transliteration, where a memory module modeling the phonemic association between English and Arabic is used to guide the transliteration process. We run experiments on the collected data and the results demonstrate the effectiveness of our approach for English-Arabic transliteration.

pdf bib
LaoPLM: Pre-trained Language Models for Lao
Nankai Lin | Yingwen Fu | Chuwei Chen | Ziyu Yang | Shengyi Jiang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Trained on the large corpus, pre-trained language models (PLMs) can capture different levels of concepts in context and hence generate universal language representations. They can benefit from multiple downstream natural language processing (NLP) tasks. Although PTMs have been widely used in most NLP applications, especially for high-resource languages such as English, it is under-represented in Lao NLP research. Previous work on Lao has been hampered by the lack of annotated datasets and the sparsity of language resources. In this work, we construct a text classification dataset to alleviate the resource-scarce situation of the Lao language. In addition, we present the first transformer-based PTMs for Lao with four versions: BERT-Small , BERT-Base , ELECTRA-Small , and ELECTRA-Base . Furthermore, we evaluate them on two downstream tasks: part-of-speech (POS) tagging and text classification. Experiments demonstrate the effectiveness of our Lao models. We release our models and datasets to the community, hoping to facilitate the future development of Lao NLP applications.

pdf bib
BERT 4EVER@LT-EDI-ACL2022-Detecting signs of Depression from Social Media:Detecting Depression in Social Media using Prompt-Learning and Word-Emotion Cluster
Xiaotian Lin | Yingwen Fu | Ziyu Yang | Nankai Lin | Shengyi Jiang
Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion

In this paper, we report the solution of the team BERT 4EVER for the LT-EDI-2022 shared task2: Homophobia/Transphobia Detection in social media comments in ACL 2022, which aims to classify Youtube comments into one of the following categories: no,moderate, or severe depression. We model the problem as a text classification task and a text generation task and respectively propose two different models for the tasks. To combine the knowledge learned from these two different models, we softly fuse the predicted probabilities of the models above and then select the label with the highest probability as the final output. In addition, multiple augmentation strategies are leveraged to improve the model generalization capability, such as back translation and adversarial training. Experimental results demonstrate the effectiveness of the proposed models and two augmented strategies.