Joseph Cornelius


2022

pdf bib
mattica@SMM4H’22: Leveraging sentiment for stance & premise joint learning
Oscar Lithgow-Serrano | Joseph Cornelius | Fabio Rinaldi | Ljiljana Dolamic
Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task

This paper describes our submissions to the Social Media Mining for Health Applications (SMM4H) shared task 2022. Our team (mattica) participated in detecting stances and premises in tweets about health mandates related to COVID-19 (Task 2). Our approach was based on using an in-domain Pretrained Language Model, which we fine-tuned by combining different strategies such as leveraging an additional stance detection dataset through two-stage fine-tuning, joint-learning Stance and Premise detection objectives; and ensembling the sentiment-polarity given by an off-the-shelf fine-tuned model.

2021

pdf bib
Approaching SMM4H with auto-regressive language models and back-translation
Joseph Cornelius | Tilia Ellendorff | Fabio Rinaldi
Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task

We describe our submissions to the 6th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (OGNLP) participated in the sub-task: Classification of tweets self-reporting potential cases of COVID-19 (Task 5). For our submissions, we employed systems based on auto-regressive transformer models (XLNet) and back-translation for balancing the dataset.

2020

pdf bib
COVID-19 Twitter Monitor: Aggregating and Visualizing COVID-19 Related Trends in Social Media
Joseph Cornelius | Tilia Ellendorff | Lenz Furrer | Fabio Rinaldi
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

Social media platforms offer extensive information about the development of the COVID-19 pandemic and the current state of public health. In recent years, the Natural Language Processing community has developed a variety of methods to extract health-related information from posts on social media platforms. In order for these techniques to be used by a broad public, they must be aggregated and presented in a user-friendly way. We have aggregated ten methods to analyze tweets related to the COVID-19 pandemic, and present interactive visualizations of the results on our online platform, the COVID-19 Twitter Monitor. In the current version of our platform, we offer distinct methods for the inspection of the dataset, at different levels: corpus-wide, single post, and spans within each post. Besides, we allow the combination of different methods to enable a more selective acquisition of knowledge. Through the visual and interactive combination of various methods, interconnections in the different outputs can be revealed.

2019

pdf bib
UZH@CRAFT-ST: a Sequence-labeling Approach to Concept Recognition
Lenz Furrer | Joseph Cornelius | Fabio Rinaldi
Proceedings of the 5th Workshop on BioNLP Open Shared Tasks

As our submission to the CRAFT shared task 2019, we present two neural approaches to concept recognition. We propose two different systems for joint named entity recognition (NER) and normalization (NEN), both of which model the task as a sequence labeling problem. Our first system is a BiLSTM network with two separate outputs for NER and NEN trained from scratch, whereas the second system is an instance of BioBERT fine-tuned on the concept-recognition task. We exploit two strategies for extending concept coverage, ontology pretraining and backoff with a dictionary lookup. Our results show that the backoff strategy effectively tackles the problem of unseen concepts, addressing a major limitation of the chosen design. In the cross-system comparison, BioBERT proves to be a strong basis for creating a concept-recognition system, although some entity types are predicted more accurately by the BiLSTM-based system.

2018

pdf bib
UZH@SMM4H: System Descriptions
Tilia Ellendorff | Joseph Cornelius | Heath Gordon | Nicola Colic | Fabio Rinaldi
Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task

Our team at the University of Zürich participated in the first 3 of the 4 sub-tasks at the Social Media Mining for Health Applications (SMM4H) shared task. We experimented with different approaches for text classification, namely traditional feature-based classifiers (Logistic Regression and Support Vector Machines), shallow neural networks, RCNNs, and CNNs. This system description paper provides details regarding the different system architectures and the achieved results.