Australasian Language Technology Association Workshop (2020)


pdf (full)
bib (full)
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association

pdf bib
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association
Maria Kim | Daniel Beck | Meladel Mistica

pdf bib
Domain Adaptative Causality Encoder
Farhad Moghimifar | Gholamreza Haffari | Mahsa Baktashmotlagh

Automated discovery of causal relationships from text is a challenging task. Current approaches which are mainly based on the extraction of low-level relations among individual events are limited by the shortage of publicly available labelled data. Therefore, the resulting models perform poorly when applied to a distributionally different domain for which labelled data did not exist at the time of training. To overcome this limitation, in this paper, we leverage the characteristics of dependency trees and adversarial learning to address the tasks of adaptive causality identification and localisation. The term adaptive is used since the training and test data come from two distributionally different datasets, which to the best of our knowledge, this work is the first to address. Moreover, we present a new causality dataset, namely MedCaus, which integrates all types of causality in the text. Our experiments on four different benchmark causality datasets demonstrate the superiority of our approach over the existing baselines, by up to 7% improvement, on the tasks of identification and localisation of the causal relations from the text.

pdf bib
Automated Detection of Cyberbullying Against Women and Immigrants and Cross-domain Adaptability
Thushari Atapattu | Mahen Herath | Georgia Zhang | Katrina Falkner

Cyberbullying is a prevalent and growing social problem due to the surge of social media technology usage. Minorities, women, and adolescents are among the common victims of cyberbullying. Despite the advancement of NLP technologies, the automated cyberbullying detection remains challenging. This paper focuses on advancing the technology using state-of-the-art NLP techniques. We use a Twitter dataset from SemEval 2019 - Task 5 (HatEval) on hate speech against women and immigrants. Our best performing ensemble model based on DistiBERT has achieved 0.73 and 0.74 of F1 score in the task of classifying hate speech (Task A) and aggressiveness and target (Task B) respectively. We adapt the ensemble model developed for Task A to classify offensive language in external datasets and achieved ~0.7 of F1 score using three benchmark datasets, enabling promising results for cross-domain adaptability. We conduct a qualitative analysis of misclassified tweets to provide insightful recommendations for future cyberbullying research.

pdf bib
The Influence of Background Data Size on the Performance of a Score-based Likelihood Ratio System: A Case of Forensic Text Comparison
Shunichi Ishihara

This study investigates the robustness and stability of a likelihood ratio–based (LR-based) forensic text comparison (FTC) system against the size of background population data. Focus is centred on a score-based approach for estimating authorship LRs. Each document is represented with a bag-of-words model, and the Cosine distance is used as the score-generating function. A set of population data that differed in the number of scores was synthesised 20 times using the Monte-Carol simulation technique. The FTC system’s performance with different population sizes was evaluated by a gradient metric of the log–LR cost (Cllr). The experimental results revealed two outcomes: 1) that the score-based approach is rather robust against a small population size—in that, with the scores obtained from the 40~60 authors in the database, the stability and the performance of the system become fairly comparable to the system with a maximum number of authors (720); and 2) that poor performance in terms of Cllr, which occurred because of limited background population data, is largely due to poor calibration. The results also indicated that the score-based approach is more robust against data scarcity than the feature-based approach; however, this finding obliges further study.

pdf bib
Feature-Based Forensic Text Comparison Using a Poisson Model for Likelihood Ratio Estimation
Michael Carne | Shunichi Ishihara

Score- and feature-based methods are the two main ones for estimating a forensic likelihood ratio (LR) quantifying the strength of evidence. In this forensic text comparison (FTC) study, a score-based method using the Cosine distance is compared with a feature-based method built on a Poisson model with texts collected from 2,157 authors. Distance measures (e.g. Burrows’s Delta, Cosine distance) are a standard tool in authorship attribution studies. Thus, the implementation of a score-based method using a distance measure is naturally the first step for estimating LRs for textual evidence. However, textual data often violates the statistical assumptions underlying distance-based models. Furthermore, such models only assess the similarity, not the typicality, of the objects (i.e. documents) under comparison. A Poisson model is theoretically more appropriate than distance-based measures for authorship attribution, but it has never been tested with linguistic text evidence within the LR framework. The log-LR cost (Cllr) was used to assess the performance of the two methods. This study demonstrates that: (1) the feature-based method outperforms the score-based method by a Cllr value of ca. 0.09 under the best-performing settings and; (2) the performance of the feature-based method can be further improved by feature selection.

pdf bib
Modelling Verbal Morphology in Nen
Saliha Muradoglu | Nicholas Evans | Ekaterina Vylomova

Nen verbal morphology is particularly complex; a transitive verb can take up to 1,740 unique forms. The combined effect of having a large combinatoric space and a low-resource setting amplifies the need for NLP tools. Nen morphology utilises distributed exponence - a non-trivial means of mapping form to meaning. In this paper, we attempt to model Nen verbal morphology using state-of-the-art machine learning models for morphological reinflection. We explore and categorise the types of errors these systems generate. Our results show sensitivity to training data composition; different distributions of verb type yield different accuracies (patterning with E-complexity). We also demonstrate the types of patterns that can be inferred from the training data, through the case study of sycretism.

pdf bib
An Automatic Vowel Space Generator for Language Learner Pronunciation Acquisition and Correction
Xinyuan Chao | Charbel El-Khaissi | Nicholas Kuo | Priscilla Kan John | Hanna Suominen

Speech visualisations are known to help language learners to acquire correct pronunciation and promote a better study experience. We present a two-step approach based on two established techniques to display tongue tip movements of an acoustic speech signal on a vowel space plot. First we use Energy Entropy Ratio to extract vowels; and then we apply Linear Predictive Coding root method to estimate Formant 1 and Formant 2. We invited and collected acoustic data from one Modern Standard Arabic (MSA) lecture and four MSA students. Our proof of concept was able to reflect differences between the tongue tip movements in a native MSA speaker to those of a MSA language learner. This paper addresses principle methods for generating features that reflect bio-physiological features of speech and thus, facilitates an approach that can be generally adapted to languages other than MSA.

pdf bib
ABSA-Bench: Towards the Unified Evaluation of Aspect-based Sentiment Analysis Research
Abhishek Das | Wei Emma Zhang

Aspect-Based Sentiment Analysis (ABSA)has gained much attention in recent years. It is the task of identifying fine-grained opinionpolarity towards a specific aspect associated with a given target. However, there is a lack of benchmarking platform to provide a unified environment under consistent evaluation criteria for ABSA, resulting in the difficulties for fair comparisons. In this work, we address this issue and define a benchmark, ABSA-Bench, by unifying the evaluation protocols and the pre-processed publicly available datasets in a Web-based platform. ABSA-Bench provides two means of evaluations for participants to submit their predictions or models for online evaluation. Performances are ranked in the leader board and a discussion forum is supported to serve as a collaborative platform for academics and researchers to discuss queries.

pdf bib
A machine-learning based model to identify PhD-level skills in job ads
Li’An Chen | Inger Mewburn | Hanna Suonimen

Around 60% of doctoral graduates worldwide ended up working in industry rather than academia. There have been calls to more closely align the PhD curriculum with the needs of industry, but an evidence base is lacking to inform these changes. We need to find better ways to understand what industry employers really want from doctoral graduates. One good source of data is job advertisements where employers provide a ‘wish list’ of skills and expertise. In this paper, a machine learning-natural language processing (ML-NLP) based approach was used to explore and extract skill requirements from research intensive job advertisements, suitable for PhD graduates. The model developed for detecting skill requirements in job ads was driven by SVM. The experiment results showed that ML-NLP approach had the potential to replicate manual efforts in understanding job requirements of PhD graduates. Our model offers a new perspective to look at PhD-level job skill requirements.

pdf bib
Learning Causal Bayesian Networks from Text
Farhad Moghimifar | Afshin Rahimi | Mahsa Baktashmotlagh | Xue Li

Causal relationships form the basis for reasoning and decision-making in Artificial Intelligence systems. To exploit the large volume of textual data available today, the automatic discovery of causal relationships from text has emerged as a significant challenge in recent years. Existing approaches in this realm are limited to the extraction of low-level relations among individual events. To overcome the limitations of the existing approaches, in this paper, we propose a method for automatic inference of causal relationships from human written language at conceptual level. To this end, we leverage the characteristics of hierarchy of concepts and linguistic variables created from text, and represent the extracted causal relationships in the form of a Causal Bayesian Network. Our experiments demonstrate superiority of our approach over the existing approaches in inferring complex causal reasoning from the text.

pdf bib
Benchmarking of Transformer-Based Pre-Trained Models on Social Media Text Classification Datasets
Yuting Guo | Xiangjue Dong | Mohammed Ali Al-Garadi | Abeed Sarker | Cecile Paris | Diego Mollá Aliod

Free text data from social media is now widely used in natural language processing research, and one of the most common machine learning tasks performed on this data is classification. Generally speaking, performances of supervised classification algorithms on social media datasets are lower than those on texts from other sources, but recently-proposed transformer-based models have considerably improved upon legacy state-of-the-art systems. Currently, there is no study that compares the performances of different variants of transformer-based models on a wide range of social media text classification datasets. In this paper, we benchmark the performances of transformer-based pre-trained models on 25 social media text classification datasets, 6 of which are health-related. We compare three pre-trained language models, RoBERTa-base, BERTweet and ClinicalBioBERT in terms of classification accuracy. Our experiments show that RoBERTa-base and BERTweet perform comparably on most datasets, and considerably better than ClinicalBioBERT, even on health-related datasets.

pdf bib
Pandemic Literature Search: Finding Information on COVID-19
Vincent Nguyen | Maciek Rybinski | Sarvnaz Karimi | Zhenchang Xing

Finding information related to a pandemic of a novel disease raises new challenges for information seeking and retrieval, as the new information becomes available gradually. We investigate how to better rank information for pandemic information retrieval. We experiment with different ranking algorithms and propose a novel end-to-end method for neural retrieval, and demonstrate its effectiveness on the TREC COVID search. This work could lead to a search system that aids scientists, clinicians, policymakers and others in finding reliable answers from the scientific literature.

pdf bib
Information Extraction from Legal Documents: A Study in the Context of Common Law Court Judgements
Meladel Mistica | Geordie Z. Zhang | Hui Chia | Kabir Manandhar Shrestha | Rohit Kumar Gupta | Saket Khandelwal | Jeannie Paterson | Timothy Baldwin | Daniel Beck

‘Common Law’ judicial systems follow the doctrine of precedent, which means the legal principles articulated in court judgements are binding in subsequent cases in lower courts. For this reason, lawyers must search prior judgements for the legal principles that are relevant to their case. The difficulty for those within the legal profession is that the information that they are looking for may be contained within a few paragraphs or sentences, but those few paragraphs may be buried within a hundred-page document. In this study, we create a schema based on the relevant information that legal professionals seek within judgements and perform text classification based on it, with the aim of not only assisting lawyers in researching cases, but eventually enabling large-scale analysis of legal judgements to find trends in court outcomes over time.

pdf bib
Convolutional and Recurrent Neural Networks for Spoken Emotion Recognition
Aaron Keesing | Ian Watson | Michael Witbrock

We test four models proposed in the speech emotion recognition (SER) literature on 15 public and academic licensed datasets in speaker-independent cross-validation. Results indicate differences in the performance of the models which is partly dependent on the dataset and features used. We also show that a standard utterance-level feature set still performs competitively with neural models on some datasets. This work serves as a starting point for future model comparisons, in addition to open-sourcing the testing code.

pdf bib
Popularity Prediction of Online Petitions using a Multimodal DeepRegression Model
Kotaro Kitayama | Shivashankar Subramanian | Timothy Baldwin

Online petitions offer a mechanism for peopleto initiate a request for change and gather sup-port from others to demonstrate support for thecause. In this work, we model the task of peti-tion popularity using both text and image rep-resentations across four different languages,and including petition metadata. We evaluateour proposed approach using a dataset of 75kpetitions from, and find strong com-plementarity between text and images.

pdf bib
Exploring Looping Effects in RNN-based Architectures
Andrei Shcherbakov | Saliha Muradoglu | Ekaterina Vylomova

The paper investigates repetitive loops, a common problem in contemporary text generation (such as machine translation, language modelling, morphological inflection) systems. More specifically, we conduct a study on neural models with recurrent units by explicitly altering their decoder internal state. We use a task of morphological reinflection task as a proxy to study the effects of the changes. Our results show that the probability of the occurrence of repetitive loops is significantly reduced by introduction of an extra neural decoder output. The output should be specifically trained to produce gradually increasing value upon generation of each character of a given sequence. We also explored variations of the technique and found that feeding the extra output back to the decoder amplifies the positive effects.

pdf bib
Transformer Semantic Parsing
Gabriela Ferraro | Hanna Suominen

In neural semantic parsing, sentences are mapped to meaning representations using encoder-decoder frameworks. In this paper, we propose to apply the Transformer architecture, instead of recurrent neural networks, to this task. Experiments in two data sets from different domains and with different levels of difficulty show that our model achieved better results than strong baselines in certain settings and competitive results across all our experiments.

pdf bib
Overview of the 2020 ALTA Shared Task: Assess Human Behaviour
Diego Mollá

The 2020 ALTA shared task is the 11th in stance of a series of shared tasks organised by ALTA since 2010. The task is to classify texts posted in social media according to human judgements expressed in them. The data used for this task is a subset of SemEval 2018 AIT DISC, which has been annotated by domain experts for this task. In this paper we introduce the task, describe the data and present the results of participating systems.

pdf bib
Automatically Predicting Judgement Dimensions of Human Behaviour
Segun Taofeek Aroyehun | Alexander Gelbukh

This paper describes our submission to the ALTA-2020 shared task on assessing behaviour from short text, We evaluate the effectiveness of traditional machine learning and recent transformers pre-trained models. Our submission with the Roberta-large model and prediction threshold achieved first place on the private leaderboard.

pdf bib
Classifying Judgements using Transfer Learning
Pradeesh Parameswaran | Andrew Trotman | Veronica Liesaputra | David Eyers

We describe our method for classifying short texts into the APPRAISAL framework, work we conducted as part of the ALTA 2020 shared task. We tackled this problem using transfer learning. Our team, “orangutanV2” placed equal first in the shared task, with a mean F1-score of 0.1026 on the private data set.

pdf bib
Human Behavior Assessment using Ensemble Models
Abdullah Faiz Ur Rahman Khilji | Rituparna Khaund | Utkarsh Sinha

Behavioral analysis is a pertinent step in today’s automated age. It is important to judge a statement on a variety of parameters before reaching a valid conclusion. In today’s world of technology and automation, Natural language processing tools have benefited from growing access to data in order to analyze the context and scenario. A better understanding of human behaviors would empower a range of automated tools to provide users a customized experience. For precise analysis, behavior understanding is important. We have experimented with various machine learning techniques, and have obtained a maximum private score of 0.1033 with a public score of 0.1733. The methods are described as part of the ALTA 2020 shared task. In this work, we have enlisted our results and the challenges faced to solve the problem of the human behavior assessment.