Sarvnaz Karimi


2021

pdf bib
Measuring Similarity of Opinion-bearing Sentences
Wenyi Tay | Xiuzhen Zhang | Stephen Wan | Sarvnaz Karimi
Proceedings of the Third Workshop on New Frontiers in Summarization

For many NLP applications of online reviews, comparison of two opinion-bearing sentences is key. We argue that, while general purpose text similarity metrics have been applied for this purpose, there has been limited exploration of their applicability to opinion texts. We address this gap in the literature, studying: (1) how humans judge the similarity of pairs of opinion-bearing sentences; and, (2) the degree to which existing text similarity metrics, particularly embedding-based ones, correspond to human judgments. We crowdsourced annotations for opinion sentence pairs and our main findings are: (1) annotators tend to agree on whether or not opinion sentences are similar or different; and (2) embedding-based metrics capture human judgments of “opinion similarity” but not “opinion difference”. Based on our analysis, we identify areas where the current metrics should be improved. We further propose to learn a similarity metric for opinion similarity via fine-tuning the Sentence-BERT sentence-embedding network based on review text and weak supervision by review ratings. Experiments show that our learned metric outperforms existing text similarity metrics and especially show significantly higher correlations with human annotations for differing opinions.

pdf bib
Combining Shallow and Deep Representations for Text-Pair Classification
Vincent Nguyen | Sarvnaz Karimi | Zhenchang Xing
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

Text-pair classification is the task of determining the class relationship between two sentences. It is embedded in several tasks such as paraphrase identification and duplicate question detection. Contemporary methods use fine-tuned transformer encoder semantic representations of the classification token in the text-pair sequence from the transformer’s final layer for class prediction. However, research has shown that earlier parts of the network learn shallow features, such as syntax and structure, which existing methods do not directly exploit. We propose a novel convolution-based decoder for transformer-based architecture that maximizes the use of encoder hidden features for text-pair classification. Our model exploits hidden representations within transformer-based architecture. It outperforms a transformer encoder baseline on average by 50% (relative F1-score) on six datasets from the medical, software engineering, and open-domains. Our work shows that transformer-based models can improve text-pair classification by modifying the fine-tuning step to exploit shallow features while improving model generalization, with only a slight reduction in efficiency.

pdf bib
Cross-Domain Language Modeling: An Empirical Investigation
Vincent Nguyen | Sarvnaz Karimi | Maciej Rybinski | Zhenchang Xing
Proceedings of the The 19th Annual Workshop of the Australasian Language Technology Association

Transformer encoder models exhibit strong performance in single-domain applications. However, in a cross-domain situation, using a sub-word vocabulary model results in sub-word overlap. This is an issue when there is an overlap between sub-words that share no semantic similarity between domains. We hypothesize that alleviating this overlap allows for a more effective modeling of multi-domain tasks; we consider the biomedical and general domains in this paper. We present a study on reducing sub-word overlap by scaling the vocabulary size in a Transformer encoder model while pretraining with multiple domains. We observe a significant increase in downstream performance in the general-biomedical cross-domain from a reduction in sub-word overlap.

2020

pdf bib
An Effective Transition-based Model for Discontinuous NER
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Unlike widely used Named Entity Recognition (NER) data sets in generic domains, biomedical NER data sets often contain mentions consisting of discontinuous spans. Conventional sequence tagging techniques encode Markov assumptions that are efficient but preclude recovery of these mentions. We propose a simple, effective transition-based model with generic neural encoding for discontinuous NER. Through extensive experiments on three biomedical data sets, we show that our model can effectively recognize discontinuous mentions without sacrificing the accuracy on continuous mentions.

pdf bib
Pandemic Literature Search: Finding Information on COVID-19
Vincent Nguyen | Maciek Rybinski | Sarvnaz Karimi | Zhenchang Xing
Proceedings of the The 18th Annual Workshop of the Australasian Language Technology Association

Finding information related to a pandemic of a novel disease raises new challenges for information seeking and retrieval, as the new information becomes available gradually. We investigate how to better rank information for pandemic information retrieval. We experiment with different ranking algorithms and propose a novel end-to-end method for neural retrieval, and demonstrate its effectiveness on the TREC COVID search. This work could lead to a search system that aids scientists, clinicians, policymakers and others in finding reliable answers from the scientific literature.

pdf bib
Cost-effective Selection of Pretraining Data: A Case Study of Pretraining BERT on Social Media
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Findings of the Association for Computational Linguistics: EMNLP 2020

Recent studies on domain-specific BERT models show that effectiveness on downstream tasks can be improved when models are pretrained on in-domain data. Often, the pretraining data used in these models are selected based on their subject matter, e.g., biology or computer science. Given the range of applications using social media text, and its unique language variety, we pretrain two models on tweets and forum text respectively, and empirically demonstrate the effectiveness of these two resources. In addition, we investigate how similarity measures can be used to nominate in-domain pretraining data. We publicly release our pretrained models at https://bit.ly/35RpTf0.

2019

pdf bib
Red-faced ROUGE: Examining the Suitability of ROUGE for Opinion Summary Evaluation
Wenyi Tay | Aditya Joshi | Xiuzhen Zhang | Sarvnaz Karimi | Stephen Wan
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

One of the most common metrics to automatically evaluate opinion summaries is ROUGE, a metric developed for text summarisation. ROUGE counts the overlap of word or word units between a candidate summary against reference summaries. This formulation treats all words in the reference summary equally.In opinion summaries, however, not all words in the reference are equally important. Opinion summarisation requires to correctly pair two types of semantic information: (1) aspect or opinion target; and (2) polarity of candidate and reference summaries. We investigate the suitability of ROUGE for evaluating opin-ion summaries of online reviews. Using three simulation-based experiments, we evaluate the behaviour of ROUGE for opinion summarisation on the ability to match aspect and polarity. We show that ROUGE cannot distinguish opinion summaries of similar or opposite polarities for the same aspect. Moreover,ROUGE scores have significant variance under different configuration settings. As a result, we present three recommendations for future work that uses ROUGE to evaluate opinion summarisation.

pdf bib
Does Multi-Task Learning Always Help?: An Evaluation on Health Informatics
Aditya Joshi | Sarvnaz Karimi | Ross Sparks | Cecile Paris | C Raina MacIntyre
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

Multi-Task Learning (MTL) has been an attractive approach to deal with limited labeled datasets or leverage related tasks, for a variety of NLP problems. We examine the benefit of MTL for three specific pairs of health informatics tasks that deal with: (a) overlapping symptoms for the same classification problem (personal health mention classification for influenza and for a set of symptoms); (b) overlapping medical concepts for related classification problems (vaccine usage and drug usage detection); and, (c) related classification problems (vaccination intent and vaccination relevance detection). We experiment with a simple neural architecture: a shared layer followed by task-specific dense layers. The novelty of this work is that it compares alternatives for shared layers for these pairs of tasks. While our observations agree with the promise of MTL as compared to single-task learning, for health informatics, we show that the benefit also comes with caveats in terms of the choice of shared layers and the relatedness between the participating tasks.

pdf bib
Investigating the Effect of Lexical Segmentation in Transformer-based Models on Medical Datasets
Vincent Nguyen | Sarvnaz Karimi | Zhenchang Xing
Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association

pdf bib
A Comparison of Word-based and Context-based Representations for Classification Problems in Health Informatics
Aditya Joshi | Sarvnaz Karimi | Ross Sparks | Cecile Paris | C Raina MacIntyre
Proceedings of the 18th BioNLP Workshop and Shared Task

Distributed representations of text can be used as features when training a statistical classifier. These representations may be created as a composition of word vectors or as context-based sentence vectors. We compare the two kinds of representations (word versus context) for three classification problems: influenza infection classification, drug usage classification and personal health mention classification. For statistical classifiers trained for each of these problems, context-based representations based on ELMo, Universal Sentence Encoder, Neural-Net Language Model and FLAIR are better than Word2Vec, GloVe and the two adapted using the MESH ontology. There is an improvement of 2-4% in the accuracy when these context-based representations are used instead of word-based representations.

pdf bib
ANU-CSIRO at MEDIQA 2019: Question Answering Using Deep Contextual Knowledge
Vincent Nguyen | Sarvnaz Karimi | Zhenchang Xing
Proceedings of the 18th BioNLP Workshop and Shared Task

We report on our system for textual inference and question entailment in the medical domain for the ACL BioNLP 2019 Shared Task, MEDIQA. Textual inference is the task of finding the semantic relationships between pairs of text. Question entailment involves identifying pairs of questions which have similar semantic content. To improve upon medical natural language inference and question entailment approaches to further medical question answering, we propose a system that incorporates open-domain and biomedical domain approaches to improve semantic understanding and ambiguity resolution. Our models achieve 80% accuracy on medical natural language inference (6.5% absolute improvement over the original baseline), 48.9% accuracy on recognising medical question entailment, 0.248 Spearman’s rho for question answering ranking and 68.6% accuracy for question answering classification.

pdf bib
Figurative Usage Detection of Symptom Words to Improve Personal Health Mention Detection
Adith Iyer | Aditya Joshi | Sarvnaz Karimi | Ross Sparks | Cecile Paris
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Personal health mention detection deals with predicting whether or not a given sentence is a report of a health condition. Past work mentions errors in this prediction when symptom words, i.e., names of symptoms of interest, are used in a figurative sense. Therefore, we combine a state-of-the-art figurative usage detection with CNN-based personal health mention detection. To do so, we present two methods: a pipeline-based approach and a feature augmentation-based approach. The introduction of figurative usage detection results in an average improvement of 2.21% F-score of personal health mention detection, in the case of the feature augmentation-based approach. This paper demonstrates the promise of using figurative usage detection to improve personal health mention detection.

pdf bib
NNE: A Dataset for Nested Named Entity Recognition in English Newswire
Nicky Ringland | Xiang Dai | Ben Hachey | Sarvnaz Karimi | Cecile Paris | James R. Curran
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Named entity recognition (NER) is widely used in natural language processing applications and downstream tasks. However, most NER tools target flat annotation from popular datasets, eschewing the semantic information available in nested entity mentions. We describe NNE—a fine-grained, nested named entity dataset over the full Wall Street Journal portion of the Penn Treebank (PTB). Our annotation comprises 279,795 mentions of 114 entity types with up to 6 layers of nesting. We hope the public release of this large dataset for English newswire will encourage development of new techniques for nested NER.

pdf bib
Using Similarity Measures to Select Pretraining Data for NER
Xiang Dai | Sarvnaz Karimi | Ben Hachey | Cecile Paris
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Word vectors and Language Models (LMs) pretrained on a large amount of unlabelled data can dramatically improve various Natural Language Processing (NLP) tasks. However, the measure and impact of similarity between pretraining data and target task data are left to intuition. We propose three cost-effective measures to quantify different aspects of similarity between source pretraining and target task data. We demonstrate that these measures are good predictors of the usefulness of pretrained models for Named Entity Recognition (NER) over 30 data pairs. Results also suggest that pretrained LMs are more effective and more predictable than pretrained word vectors, but pretrained word vectors are better when pretraining data is dissimilar.

2018

pdf bib
Shot Or Not: Comparison of NLP Approaches for Vaccination Behaviour Detection
Aditya Joshi | Xiang Dai | Sarvnaz Karimi | Ross Sparks | Cécile Paris | C Raina MacIntyre
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

Vaccination behaviour detection deals with predicting whether or not a person received/was about to receive a vaccine. We present our submission for vaccination behaviour detection shared task at the SMM4H workshop. Our findings are based on three prevalent text classification approaches: rule-based, statistical and deep learning-based. Our final submissions are: (1) an ensemble of statistical classifiers with task-specific features derived using lexicons, language processing tools and word embeddings; and, (2) a LSTM classifier with pre-trained language models.

2017

pdf bib
Medication and Adverse Event Extraction from Noisy Text
Xiang Dai | Sarvnaz Karimi | Cecile Paris
Proceedings of the Australasian Language Technology Association Workshop 2017

pdf bib
Automatic Diagnosis Coding of Radiology Reports: A Comparison of Deep Learning and Conventional Classification Methods
Sarvnaz Karimi | Xiang Dai | Hamed Hassanzadeh | Anthony Nguyen
BioNLP 2017

Diagnosis autocoding services and research intend to both improve the productivity of clinical coders and the accuracy of the coding. It is an important step in data analysis for funding and reimbursement, as well as health services planning and resource allocation. We investigate the applicability of deep learning at autocoding of radiology reports using International Classification of Diseases (ICD). Deep learning methods are known to require large training data. Our goal is to explore how to use these methods when the training data is sparse, skewed and relatively small, and how their effectiveness compares to conventional methods. We identify optimal parameters that could be used in setting up a convolutional neural network for autocoding with comparable results to that of conventional methods.

2015

pdf bib
Squibs: Evaluation Methods for Statistically Dependent Text
Sarvnaz Karimi | Jie Yin | Jiri Baum
Computational Linguistics, Volume 41, Issue 3 - September 2015

2014

pdf bib
Overview of the 2014 ALTA Shared Task: Identifying Expressions of Locations in Tweets
Diego Molla | Sarvnaz Karimi
Proceedings of the Australasian Language Technology Association Workshop 2014

2013

pdf bib
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)
Sarvnaz Karimi | Karin Verspoor
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

2010

pdf bib
Best Topic Word Selection for Topic Labelling
Jey Han Lau | David Newman | Sarvnaz Karimi | Timothy Baldwin
Coling 2010: Posters

2007

pdf bib
Corpus Effects on the Evaluation of Automated Transliteration Systems
Sarvnaz Karimi | Andrew Turpin | Falk Scholer
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Collapsed Consonant and Vowel Models: New Approaches for English-Persian Transliteration and Back-Transliteration
Sarvnaz Karimi | Falk Scholer | Andrew Turpin
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics