Chengyu Huang


pdf bib
Conversation Disentanglement with Bi-Level Contrastive Learning
Chengyu Huang | Zheng Zhang | Hao Fei | Lizi Liao
Findings of the Association for Computational Linguistics: EMNLP 2022

Conversation disentanglement aims to group utterances into detached sessions, which is a fundamental task in processing multi-party conversations. Existing methods have two main drawbacks. First, they overemphasize pairwise utterance relations but pay inadequate attention to the utterance-to-context relation modeling. Second, huge amount of human annotated data is required for training, which is expensive to obtain in practice. To address these issues, we propose a general disentangle model based on bi-level contrastive learning. It brings closer utterances in the same session while encourages each utterance to be near its clustered session prototypes in the representation space. Unlike existing approaches, our disentangle model works in both supervised setting with labeled data and unsupervised setting when no such data is available. The proposed method achieves new state-of-the-art performance on both settings across several public datasets.


pdf bib
Text Augmentation in a Multi-Task View
Jason Wei | Chengyu Huang | Shiqi Xu | Soroush Vosoughi
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Traditional data augmentation aims to increase the coverage of the input distribution by generating augmented examples that strongly resemble original samples in an online fashion where augmented examples dominate training. In this paper, we propose an alternative perspective—a multi-task view (MTV) of data augmentation—in which the primary task trains on original examples and the auxiliary task trains on augmented examples. In MTV data augmentation, both original and augmented samples are weighted substantively during training, relaxing the constraint that augmented examples must resemble original data and thereby allowing us to apply stronger augmentation functions. In empirical experiments using four common data augmentation techniques on three benchmark text classification datasets, we find that using the MTV leads to higher and more robust performance than traditional augmentation.

pdf bib
Few-Shot Text Classification with Triplet Networks, Data Augmentation, and Curriculum Learning
Jason Wei | Chengyu Huang | Soroush Vosoughi | Yu Cheng | Shiqi Xu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Few-shot text classification is a fundamental NLP task in which a model aims to classify text into a large number of categories, given only a few training examples per category. This paper explores data augmentation—a technique particularly suitable for training with limited data—for this few-shot, highly-multiclass text classification setting. On four diverse text classification tasks, we find that common data augmentation techniques can improve the performance of triplet networks by up to 3.0% on average. To further boost performance, we present a simple training strategy called curriculum data augmentation, which leverages curriculum learning by first training on only original examples and then introducing augmented data as training progresses. We explore a two-stage and a gradual schedule, and find that, compared with standard single-stage training, curriculum data augmentation trains faster, improves performance, and remains robust to high amounts of noising from augmentation.


pdf bib
What Are People Asking About COVID-19? A Question Classification Dataset
Jerry Wei | Chengyu Huang | Soroush Vosoughi | Jason Wei
Proceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020

We present COVID-Q, a set of 1,690 questions about COVID-19 from 13 sources, which we annotate into 15 question categories and 207 question clusters. The most common questions in our dataset asked about transmission, prevention, and societal effects of COVID, and we found that many questions that appeared in multiple sources were not answered by any FAQ websites of reputable organizations such as the CDC and FDA. We post our dataset publicly at For classifying questions into 15 categories, a BERT baseline scored 58.1% accuracy when trained on 20 examples per category, and for a question clustering task, a BERT + triplet loss baseline achieved 49.5% accuracy. We hope COVID-Q can help either for direct use in developing applied systems or as a domain-specific resource for model evaluation.