Xiuzhen (Jenny) Zhang

Also published as: Xiuzhen Zhang


pdf bib
Examining Bias in Opinion Summarisation through the Perspective of Opinion Diversity
Nannan Huang | Lin Tian | Haytham Fayek | Xiuzhen Zhang
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Opinion summarisation is a task that aims to condense the information presented in the source documents while retaining the core message and opinions. A summary that only represents the majority opinions will leave the minority opinions unrepresented in the summary. In this paper, we use the stance towards a certain target as an opinion. We study bias in opinion summarisation from the perspective of opinion diversity, which measures whether the model generated summary can cover a diverse set of opinions. In addition, we examine opinion similarity, a measure of how closely related two opinions are in terms of their stance on a given topic, and its relationship with opinion diversity. Through the lense of stances towards a topic, we examine opinion diversity and similarity using three debatable topics under COVID-19. Experimental results on these topics revealed that a higher degree of similarity of opinions did not indicate good diversity or fairly cover the various opinions originally presented in the source documents. We found that BART and ChatGPT can better capture diverse opinions presented in the source documents.

pdf bib
Task and Sentiment Adaptation for Appraisal Tagging
Lin Tian | Xiuzhen Zhang | Myung Hee Kim | Jennifer Biggs
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

The Appraisal framework in linguistics defines the framework for fine-grained evaluations and opinions and has contributed to sentiment analysis and opinion mining. As developing appraisal-annotated resources requires tagging of several dimensions with distinct semantic taxonomies, it has been primarily conducted manually by human experts through expensive and time-consuming processes. In this paper, we study how to automatically identify and annotate text segments for appraisal. We formulate the problem as a sequence tagging problem and propose novel task and sentiment adapters based on language models for appraisal tagging. Our model, named Adaptive Appraisal (Aˆ2), achieves superior performance than baseline adapter-based models and other neural classification models, especially for cross-domain and cross-language settings. Source code for Aˆ2 is available at: https://github.com/ltian678/AA-code.git


pdf bib
DUCK: Rumour Detection on Social Media by Modelling User and Comment Propagation Networks
Lin Tian | Xiuzhen Zhang | Jey Han Lau
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Social media rumours, a form of misinformation, can mislead the public and cause significant economic and social disruption. Motivated by the observation that the user network — which captures who engage with a story — and the comment network — which captures how they react to it — provide complementary signals for rumour detection, in this paper, we propose DUCK (rumour  ̲detection with  ̲user and  ̲comment networ ̲ks) for rumour detection on social media. We study how to leverage transformers and graph attention networks to jointly model the contents and structure of social media conversations, as well as the network of users who engaged in these conversations. Over four widely used benchmark rumour datasets in English and Chinese, we show that DUCK produces superior performance for detecting rumours, creating a new state-of-the-art. Source code for DUCK is available at: https://github.com/ltian678/DUCK-code.


pdf bib
Measuring Similarity of Opinion-bearing Sentences
Wenyi Tay | Xiuzhen Zhang | Stephen Wan | Sarvnaz Karimi
Proceedings of the Third Workshop on New Frontiers in Summarization

For many NLP applications of online reviews, comparison of two opinion-bearing sentences is key. We argue that, while general purpose text similarity metrics have been applied for this purpose, there has been limited exploration of their applicability to opinion texts. We address this gap in the literature, studying: (1) how humans judge the similarity of pairs of opinion-bearing sentences; and, (2) the degree to which existing text similarity metrics, particularly embedding-based ones, correspond to human judgments. We crowdsourced annotations for opinion sentence pairs and our main findings are: (1) annotators tend to agree on whether or not opinion sentences are similar or different; and (2) embedding-based metrics capture human judgments of “opinion similarity” but not “opinion difference”. Based on our analysis, we identify areas where the current metrics should be improved. We further propose to learn a similarity metric for opinion similarity via fine-tuning the Sentence-BERT sentence-embedding network based on review text and weak supervision by review ratings. Experiments show that our learned metric outperforms existing text similarity metrics and especially show significantly higher correlations with human annotations for differing opinions.

pdf bib
Evaluation of Review Summaries via Question-Answering
Nannan Huang | Xiuzhen Zhang
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

Summarisation of reviews aims at compressing opinions expressed in multiple review documents into a concise form while still covering the key opinions. Despite the advancement in summarisation models, evaluation metrics for opinionated text summaries lag behind and still rely on lexical-matching metrics such as ROUGE. In this paper, we propose to use the question-answering(QA) approach to evaluate summaries of opinions in reviews. We propose to identify opinion-bearing text spans in the reference summary to generate QA pairs so as to capture salient opinions. A QA model is then employed to probe the candidate summary to evaluate information overlap between candidate and reference summaries. We show that our metric RunQA, Review Summary Evaluation via Question Answering, correlates well with human judgments in terms of coverage and focus of information. Finally, we design an adversarial task and demonstrate that the proposed approach is more robust than metrics in the literature for ranking summaries.

pdf bib
Robustness Analysis of Grover for Machine-Generated News Detection
Rinaldo Gagiano | Maria Myung-Hee Kim | Xiuzhen Zhang | Jennifer Biggs
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

Advancements in Natural Language Generation have raised concerns on its potential misuse for deep fake news. Grover is a model for both generation and detection of neural fake news. While its performance on automatically discriminating neural fake news surpassed GPT-2 and BERT, Grover could face a variety of adversarial attacks to deceive detection. In this work, we present an investigation of Grover’s susceptibility to adversarial attacks such as character-level and word-level perturbations. The experiment results show that even a singular character alteration can cause Grover to fail, affecting up to 97% of target articles with unlimited attack attempts, exposing a lack of robustness. We further analyse these misclassified cases to highlight affected words, identify vulnerability within Grover’s encoder, and perform a novel visualisation of cumulative classification scores to assist in interpreting model behaviour.

pdf bib
Does QA-based intermediate training help fine-tuning language models for text classification?
Shiwei Zhang | Xiuzhen Zhang
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

Fine-tuning pre-trained language models for downstream tasks has become a norm for NLP. Recently it is found that intermediate training can improve performance for fine-tuning language models for target tasks, high-level inference tasks such as Question Answering (QA) tend to work best as intermediate tasks. However it is not clear if intermediate training generally benefits various language models. In this paper, using the SQuAD-2.0 QA task for intermediate training for target text classification tasks, we experimented on eight tasks for single-sequence classification and eight tasks for sequence-pair classification using two base and two compact language models. Our experiments show that QA-based intermediate training generates varying transfer performance across different language models, except for similar QA tasks.


pdf bib
Red-faced ROUGE: Examining the Suitability of ROUGE for Opinion Summary Evaluation
Wenyi Tay | Aditya Joshi | Xiuzhen Zhang | Sarvnaz Karimi | Stephen Wan
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

One of the most common metrics to automatically evaluate opinion summaries is ROUGE, a metric developed for text summarisation. ROUGE counts the overlap of word or word units between a candidate summary against reference summaries. This formulation treats all words in the reference summary equally. In opinion summaries, however, not all words in the reference are equally important. Opinion summarisation requires to correctly pair two types of semantic information: (1) aspect or opinion target; and (2) polarity of candidate and reference summaries. We investigate the suitability of ROUGE for evaluating opin-ion summaries of online reviews. Using three simulation-based experiments, we evaluate the behaviour of ROUGE for opinion summarisation on the ability to match aspect and polarity. We show that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect. Moreover,ROUGE scores have significant variance under different configuration settings. As a result, we present three recommendations for future work that uses ROUGE to evaluate opinion summarisation.


pdf bib
Neural Sparse Topical Coding
Min Peng | Qianqian Xie | Yanchun Zhang | Hua Wang | Xiuzhen Zhang | Jimin Huang | Gang Tian
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Topic models with sparsity enhancement have been proven to be effective at learning discriminative and coherent latent topics of short texts, which is critical to many scientific and engineering applications. However, the extensions of these models require carefully tailored graphical models and re-deduced inference algorithms, limiting their variations and applications. We propose a novel sparsity-enhanced topic model, Neural Sparse Topical Coding (NSTC) base on a sparsity-enhanced topic model called Sparse Topical Coding (STC). It focuses on replacing the complex inference process with the back propagation, which makes the model easy to explore extensions. Moreover, the external semantic information of words in word embeddings is incorporated to improve the representation of short texts. To illustrate the flexibility offered by the neural network based framework, we present three extensions base on NSTC without re-deduced inference algorithms. Experiments on Web Snippet and 20Newsgroups datasets demonstrate that our models outperform existing methods.

pdf bib
Proceedings of the Australasian Language Technology Association Workshop 2018
Sunghwan Mac Kim | Xiuzhen (Jenny) Zhang
Proceedings of the Australasian Language Technology Association Workshop 2018