Xiaoyuan Liu


pdf bib
Sentence-aware Adversarial Meta-Learning for Few-Shot Text Classification
Suhe Wang | Xiaoyuan Liu | Bo Liu | Diwen Dong
Proceedings of the 29th International Conference on Computational Linguistics

Meta-learning has emerged as an effective approach for few-shot text classification. However, current studies fail to realize the importance of the semantic interaction between sentence features and neglect to enhance the generalization ability of the model to new tasks. In this paper, we integrate an adversarial network architecture into the meta-learning system and leverage cost-effective modules to build a novel few-shot classification framework named SaAML. Significantly, our approach can exploit the temporal convolutional network to encourage more discriminative representation learning and explore the attention mechanism to promote more comprehensive feature expression, thus resulting in better adaptation for new classes. Through a series of experiments on four benchmark datasets, we demonstrate that our new framework acquires considerable superiority over state-of-the-art methods in all datasets, increasing the performance of 1-shot classification and 5-shot classification by 7.15% and 2.89%, respectively.


pdf bib
Pretrained Transformers Improve Out-of-Distribution Robustness
Dan Hendrycks | Xiaoyuan Liu | Eric Wallace | Adam Dziedzic | Rishabh Krishnan | Dawn Song
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Although pretrained Transformers such as BERT achieve high accuracy on in-distribution examples, do they generalize to new distributions? We systematically measure out-of-distribution (OOD) generalization for seven NLP datasets by constructing a new robustness benchmark with realistic distribution shifts. We measure the generalization of previous models including bag-of-words models, ConvNets, and LSTMs, and we show that pretrained Transformers’ performance declines are substantially smaller. Pretrained transformers are also more effective at detecting anomalous or OOD examples, while many previous models are frequently worse than chance. We examine which factors affect robustness, finding that larger models are not necessarily more robust, distillation can be harmful, and more diverse pretraining data can enhance robustness. Finally, we show where future work can improve OOD robustness.