Yida Mu


2023

pdf bib
Classification-Aware Neural Topic Model Combined with Interpretable Analysis - for Conflict Classification
Tianyu Liang | Yida Mu | Soonho Kim | Darline Kuate | Julie Lang | Rob Vos | Xingyi Song
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing

A large number of conflict events are affecting the world all the time. In order to analyse such conflict events effectively, this paper presents a Classification-Aware Neural Topic Model (CANTM-IA) for Conflict Information Classification and Topic Discovery. The model provides a reliable interpretation of classification results and discovered topics by introducing interpretability analysis. At the same time, interpretation is introduced into the model architecture to improve the classification performance of the model and to allow interpretation to focus further on the details of the data. Finally, the model architecture is optimised to reduce the complexity of the model.

pdf bib
It’s about Time: Rethinking Evaluation on Rumor Detection Benchmarks using Chronological Splits
Yida Mu | Kalina Bontcheva | Nikolaos Aletras
Findings of the Association for Computational Linguistics: EACL 2023

New events emerge over time influencing the topics of rumors in social media. Current rumor detection benchmarks use random splits as training, development and test sets which typically results in topical overlaps. Consequently, models trained on random splits may not perform well on rumor classification on previously unseen topics due to the temporal concept drift. In this paper, we provide a re-evaluation of classification models on four popular rumor detection benchmarks considering chronological instead of random splits. Our experimental results show that the use of random splits can significantly overestimate predictive performance across all datasets and models. Therefore, we suggest that rumor detection models should always be evaluated using chronological splits for minimizing topical overlaps.

pdf bib
Don’t waste a single annotation: improving single-label classifiers through soft labels
Ben Wu | Yue Li | Yida Mu | Carolina Scarton | Kalina Bontcheva | Xingyi Song
Findings of the Association for Computational Linguistics: EMNLP 2023

In this paper, we address the limitations of the common data annotation and training methods for objective single-label classification tasks. Typically, when annotating such tasks annotators are only asked to provide a single label for each sample and annotator disagreement is discarded when a final hard label is decided through majority voting. We challenge this traditional approach, acknowledging that determining the appropriate label can be difficult due to the ambiguity and lack of context in the data samples. Rather than discarding the information from such ambiguous annotations, our soft label method makes use of them for training. Our findings indicate that additional annotator information, such as confidence, secondary label and disagreement, can be used to effectively generate soft labels. Training classifiers with these soft labels then leads to improved performance and calibration on the hard label test set.