Xiaoxuan Wang


2023

pdf bib
Learning under Label Proportions for Text Classification
Jatin Chauhan | Xiaoxuan Wang | Wei Wang
Findings of the Association for Computational Linguistics: EMNLP 2023

We present one of the preliminary NLP works under the challenging setup of Learning from Label Proportions (LLP), where the data is provided in an aggregate form called bags and only the proportion of samples in each class as the ground truth. This setup is inline with the desired characteristics of training models under Privacy settings and Weakly supervision. By characterizing some irregularities of the most widely used baseline technique DLLP, we propose a novel formulation that is also robust. This is accompanied with a learnability result that provides a generalization bound under LLP. Combining this formulation with a self-supervised objective, our method achieves better results as compared to the baselines in almost 87% of the experimental configurations which include large scale models for both long and short range texts across multiple metrics.

2017

pdf bib
Fake news stance detection using stacked ensemble of classifiers
James Thorne | Mingjie Chen | Giorgos Myrianthous | Jiashu Pu | Xiaoxuan Wang | Andreas Vlachos
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

Fake news has become a hotly debated topic in journalism. In this paper, we present our entry to the 2017 Fake News Challenge which models the detection of fake news as a stance classification task that finished in 11th place on the leader board. Our entry is an ensemble system of classifiers developed by students in the context of their coursework. We show how we used the stacking ensemble method for this purpose and obtained improvements in classification accuracy exceeding each of the individual models’ performance on the development data. Finally, we discuss aspects of the experimental setup of the challenge.