Aili Shen


pdf bib
On the (In)Effectiveness of Images for Text Classification
Chunpeng Ma | Aili Shen | Hiyori Yoshikawa | Tomoya Iwakura | Daniel Beck | Timothy Baldwin
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

Images are core components of multi-modal learning in natural language processing (NLP), and results have varied substantially as to whether images improve NLP tasks or not. One confounding effect has been that previous NLP research has generally focused on sophisticated tasks (in varying settings), generally applied to English only. We focus on text classification, in the context of assigning named entity classes to a given Wikipedia page, where images generally complement the text and the Wikipedia page can be in one of a number of different languages. Our experiments across a range of languages show that images complement NLP models (including BERT) trained without external pre-training, but when combined with BERT models pre-trained on large-scale external data, images contribute nothing.

pdf bib
A Simple yet Effective Method for Sentence Ordering
Aili Shen | Timothy Baldwin
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Sentence ordering is the task of arranging a given bag of sentences so as to maximise the coherence of the overall text. In this work, we propose a simple yet effective training method that improves the capacity of models to capture overall text coherence based on training over pairs of sentences/segments. Experimental results show the superiority of our proposed method in in- and cross-domain settings. The utility of our method is also verified over a multi-document summarisation task.


pdf bib
Modelling Uncertainty in Collaborative Document Quality Assessment
Aili Shen | Daniel Beck | Bahar Salehi | Jianzhong Qi | Timothy Baldwin
Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)

In the context of document quality assessment, previous work has mainly focused on predicting the quality of a document relative to a putative gold standard, without paying attention to the subjectivity of this task. To imitate people’s disagreement over inherently subjective tasks such as rating the quality of a Wikipedia article, a document quality assessment system should provide not only a prediction of the article quality but also the uncertainty over its predictions. This motivates us to measure the uncertainty in document quality predictions, in addition to making the label prediction. Experimental results show that both Gaussian processes (GPs) and random forests (RFs) can yield competitive results in predicting the quality of Wikipedia articles, while providing an estimate of uncertainty when there is inconsistency in the quality labels from the Wikipedia contributors. We additionally evaluate our methods in the context of a semi-automated document quality class assignment decision-making process, where there is asymmetric risk associated with overestimates and underestimates of document quality. Our experiments suggest that GPs provide more reliable estimates in this context.

pdf bib
Feature-guided Neural Model Training for Supervised Document Representation Learning
Aili Shen | Bahar Salehi | Jianzhong Qi | Timothy Baldwin
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association


pdf bib
A Hybrid Model for Quality Assessment of Wikipedia Articles
Aili Shen | Jianzhong Qi | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2017