Aili Shen


2022

pdf bib
Optimising Equal Opportunity Fairness in Model Training
Aili Shen | Xudong Han | Trevor Cohn | Timothy Baldwin | Lea Frermann
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Real-world datasets often encode stereotypes and societal biases. Such biases can be implicitly captured by trained models, leading to biased predictions and exacerbating existing societal preconceptions. Existing debiasing methods, such as adversarial training and removing protected information from representations, have been shown to reduce bias. However, a disconnect between fairness criteria and training objectives makes it difficult to reason theoretically about the effectiveness of different techniques. In this work, we propose two novel training objectives which directly optimise for the widely-used criterion of equal opportunity, and show that they are effective in reducing bias while maintaining high performance over two classification tasks.

2021

pdf bib
Evaluating Hierarchical Document Categorisation
Qian Sun | Aili Shen | Hiyori Yoshikawa | Chunpeng Ma | Daniel Beck | Tomoya Iwakura | Timothy Baldwin
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

Hierarchical document categorisation is a special case of multi-label document categorisation, where there is a taxonomic hierarchy among the labels. While various approaches have been proposed for hierarchical document categorisation, there is no standard benchmark dataset, resulting in different methods being evaluated independently and there being no empirical consensus on what methods perform best. In this work, we examine different combinations of neural text encoders and hierarchical methods in an end-to-end framework, and evaluate over three datasets. We find that the performance of hierarchical document categorisation is determined not only by how the hierarchical information is modelled, but also the structure of the label hierarchy and class distribution.

pdf bib
Evaluating Document Coherence Modeling
Aili Shen | Meladel Mistica | Bahar Salehi | Hang Li | Timothy Baldwin | Jianzhong Qi
Transactions of the Association for Computational Linguistics, Volume 9

Abstract While pretrained language models (LMs) have driven impressive gains over morpho-syntactic and semantic tasks, their ability to model discourse and pragmatic phenomena is less clear. As a step towards a better understanding of their discourse modeling capabilities, we propose a sentence intrusion detection task. We examine the performance of a broad range of pretrained LMs on this detection task for English. Lacking a dataset for the task, we introduce INSteD, a novel intruder sentence detection dataset, containing 170,000+ documents constructed from English Wikipedia and CNN news articles. Our experiments show that pretrained LMs perform impressively in in-domain evaluation, but experience a substantial drop in the cross-domain setting, indicating limited generalization capacity. Further results over a novel linguistic probe dataset show that there is substantial room for improvement, especially in the cross- domain setting.

pdf bib
A Simple yet Effective Method for Sentence Ordering
Aili Shen | Timothy Baldwin
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Sentence ordering is the task of arranging a given bag of sentences so as to maximise the coherence of the overall text. In this work, we propose a simple yet effective training method that improves the capacity of models to capture overall text coherence based on training over pairs of sentences/segments. Experimental results show the superiority of our proposed method in in- and cross-domain settings. The utility of our method is also verified over a multi-document summarisation task.

pdf bib
On the (In)Effectiveness of Images for Text Classification
Chunpeng Ma | Aili Shen | Hiyori Yoshikawa | Tomoya Iwakura | Daniel Beck | Timothy Baldwin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.

2019

pdf bib
Modelling Uncertainty in Collaborative Document Quality Assessment
Aili Shen | Daniel Beck | Bahar Salehi | Jianzhong Qi | Timothy Baldwin
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task. To imitate people’s disagreement over inherently subjective tasks such as rating the quality of a Wikipedia article, a document quality assessment system should provide not only a prediction of the article quality but also the uncertainty over its predictions. This motivates us to measure the uncertainty in document quality predictions, in addition to making the label prediction. Experimental results show that both Gaussian processes (GPs) and random forests (RFs) can yield competitive results in predicting the quality of Wikipedia articles, while providing an estimate of uncertainty when there is inconsistency in the quality labels from the Wikipedia contributors. We additionally evaluate our methods in the context of a semi-automated document quality class assignment decision-making process, where there is asymmetric risk associated with overestimates and underestimates of document quality. Our experiments suggest that GPs provide more reliable estimates in this context.

pdf bib
Feature-guided Neural Model Training for Supervised Document Representation Learning
Aili Shen | Bahar Salehi | Jianzhong Qi | Timothy Baldwin
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

2017

pdf bib
A Hybrid Model for Quality Assessment of Wikipedia Articles
Aili Shen | Jianzhong Qi | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2017