Ozlem Uzuner

Also published as: Özlem Uzuner


pdf bib
SemEval-2021 Task 10: Source-Free Domain Adaptation for Semantic Processing
Egoitz Laparra | Xin Su | Yiyun Zhao | Özlem Uzuner | Timothy Miller | Steven Bethard
Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021)

This paper presents the Source-Free Domain Adaptation shared task held within SemEval-2021. The aim of the task was to explore adaptation of machine-learning models in the face of data sharing constraints. Specifically, we consider the scenario where annotations exist for a domain but cannot be shared. Instead, participants are provided with models trained on that (source) data. Participants also receive some labeled data from a new (development) domain on which to explore domain adaptation algorithms. Participants are then tested on data representing a new (target) domain. We explored this scenario with two different semantic tasks: negation detection (a text classification task) and time expression recognition (a sequence tagging task).

pdf bib
Leveraging Offensive Language for Sarcasm and Sentiment Detection in Arabic
Fatemah Husain | Ozlem Uzuner
Proceedings of the Sixth Arabic Natural Language Processing Workshop

Sarcasm detection is one of the top challenging tasks in text classification, particularly for informal Arabic with high syntactic and semantic ambiguity. We propose two systems that harness knowledge from multiple tasks to improve the performance of the classifier. This paper presents the systems used in our participation to the two sub-tasks of the Sixth Arabic Natural Language Processing Workshop (WANLP); Sarcasm Detection and Sentiment Analysis. Our methodology is driven by the hypothesis that tweets with negative sentiment and tweets with sarcasm content are more likely to have offensive content, thus, fine-tuning the classification model using large corpus of offensive language, supports the learning process of the model to effectively detect sentiment and sarcasm contents. Results demonstrate the effectiveness of our approach for sarcasm detection task over sentiment analysis task.

pdf bib
MNLP at MEDIQA 2021: Fine-Tuning PEGASUS for Consumer Health Question Summarization
Jooyeon Lee | Huong Dang | Ozlem Uzuner | Sam Henry
Proceedings of the 20th Workshop on Biomedical Language Processing

This paper details a Consumer Health Question (CHQ) summarization model submitted to MEDIQA 2021 for shared task 1: Question Summarization. Many CHQs are composed of multiple sentences with typos or unnecessary information, which can interfere with automated question answering systems. Question summarization mitigates this issue by removing this unnecessary information, aiding automated systems in generating a more accurate summary. Our summarization approach focuses on applying multiple pre-processing techniques, including question focus identification on the input and the development of an ensemble method to combine question focus with an abstractive summarization method. We use the state-of-art abstractive summarization model, PEGASUS (Pre-training with Extracted Gap-sentences for Abstractive Summarization), to generate abstractive summaries. Our experiments show that using our ensemble method, which combines abstractive summarization with question focus identification, improves performance over using summarization alone. Our model shows a ROUGE-2 F-measure of 11.14% against the official test dataset.


pdf bib
SalamNET at SemEval-2020 Task 12: Deep Learning Approach for Arabic Offensive Language Detection
Fatemah Husain | Jooyeon Lee | Sam Henry | Ozlem Uzuner
Proceedings of the Fourteenth Workshop on Semantic Evaluation

This paper describes SalamNET, an Arabic offensive language detection system that has been submitted to SemEval 2020 shared task 12: Multilingual Offensive Language Identification in Social Media. Our approach focuses on applying multiple deep learning models and conducting in depth error analysis of results to provide system implications for future development considerations. To pursue our goal, a Recurrent Neural Network (RNN), a Gated Recurrent Unit (GRU), and Long-Short Term Memory (LSTM) models with different design architectures have been developed and evaluated. The SalamNET, a Bi-directional Gated Recurrent Unit (Bi-GRU) based model, reports a macro-F1 score of 0.83%

pdf bib
Ensemble BERT for Classifying Medication-mentioning Tweets
Huong Dang | Kahyun Lee | Sam Henry | Özlem Uzuner
Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task

Twitter is a valuable source of patient-generated data that has been used in various population health studies. The first step in many of these studies is to identify and capture Twitter messages (tweets) containing medication mentions. In this article, we describe our submission to Task 1 of the Social Media Mining for Health Applications (SMM4H) Shared Task 2020. This task challenged participants to detect tweets that mention medications or dietary supplements in a natural, highly imbalance dataset. Our system combined a handcrafted preprocessing step with an ensemble of 20 BERT-based classifiers generated by dividing the training dataset into subsets using 10-fold cross validation and exploiting two BERT embedding models. Our system ranked first in this task, and improved the average F1 score across all participating teams by 19.07% with a precision, recall, and F1 on the test set of 83.75%, 87.01%, and 85.35% respectively.


pdf bib
CLPsych 2019 Shared Task: Predicting the Degree of Suicide Risk in Reddit Posts
Ayah Zirikly | Philip Resnik | Özlem Uzuner | Kristy Hollingshead
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology

The shared task for the 2019 Workshop on Computational Linguistics and Clinical Psychology (CLPsych’19) introduced an assessment of suicide risk based on social media postings, using data from Reddit to identify users at no, low, moderate, or severe risk. Two variations of the task focused on users whose posts to the r/SuicideWatch subreddit indicated they might be at risk; a third task looked at screening users based only on their more everyday (non-SuicideWatch) posts. We received submissions from 15 different teams, and the results provide progress and insight into the value of language signal in helping to predict risk level.

pdf bib
Deep Learning for Identification of Adverse Effect Mentions In Twitter Data
Paul Barry | Ozlem Uzuner
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task

Social Media Mining for Health Applications (SMM4H) Adverse Effect Mentions Shared Task challenges participants to accurately identify spans of text within a tweet that correspond to Adverse Effects (AEs) resulting from medication usage (Weissenbacher et al., 2019). This task features a training data set of 2,367 tweets, in addition to a 1,000 tweet evaluation data set. The solution presented here features a bidirectional Long Short-term Memory Network (bi-LSTM) for the generation of character-level embeddings. It uses a second bi-LSTM trained on both character and token level embeddings to feed a Conditional Random Field (CRF) which provides the final classification. This paper further discusses the deep learning algorithms used in our solution.


pdf bib
Enhancing Cohesion and Coherence of Fake Text to Improve Believability for Deceiving Cyber Attackers
Prakruthi Karuna | Hemant Purohit | Özlem Uzuner | Sushil Jajodia | Rajesh Ganesan
Proceedings of the First International Workshop on Language Cognition and Computational Models

Ever increasing ransomware attacks and thefts of intellectual property demand cybersecurity solutions to protect critical documents. One emerging solution is to place fake text documents in the repository of critical documents for deceiving and catching cyber attackers. We can generate fake text documents by obscuring the salient information in legit text documents. However, the obscuring process can result in linguistic inconsistencies, such as broken co-references and illogical flow of ideas across the sentences, which can discern the fake document and render it unbelievable. In this paper, we propose a novel method to generate believable fake text documents by automatically improving the linguistic consistency of computer-generated fake text. Our method focuses on enhancing syntactic cohesion and semantic coherence across discourse segments. We conduct experiments with human subjects to evaluate the effect of believability improvements in distinguishing legit texts from fake texts. Results show that the probability to distinguish legit texts from believable fake texts is consistently lower than from fake texts that have not been improved in believability. This indicates the effectiveness of our method in generating believable fake text.


pdf bib
Feature-Augmented Neural Networks for Patient Note De-identification
Ji Young Lee | Franck Dernoncourt | Özlem Uzuner | Peter Szolovits
Proceedings of the Clinical Natural Language Processing Workshop (ClinicalNLP)

Patient notes contain a wealth of information of potentially great interest to medical investigators. However, to protect patients’ privacy, Protected Health Information (PHI) must be removed from the patient notes before they can be legally released, a process known as patient note de-identification. The main objective for a de-identification system is to have the highest possible recall. Recently, the first neural-network-based de-identification system has been proposed, yielding state-of-the-art results. Unlike other systems, it does not rely on human-engineered features, which allows it to be quickly deployed, but does not leverage knowledge from human experts or from electronic health records (EHRs). In this work, we explore a method to incorporate human-engineered features as well as features derived from EHRs to a neural-network-based de-identification system. Our results show that the addition of features, especially the EHR-derived features, further improves the state-of-the-art in patient note de-identification, including for some of the most sensitive PHI types such as patient names. Since in a real-life setting patient notes typically come with EHRs, we recommend developers of de-identification systems to leverage the information EHRs contain.


pdf bib
Biomedical/Clinical NLP
Ozlem Uzuner | Meliha Yetişgen | Amber Stubbs
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Tutorial Abstracts


pdf bib
Extracting Medication Information from Discharge Summaries
Scott Halgrim | Fei Xia | Imre Solti | Eithon Cadag | Özlem Uzuner
Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents

pdf bib
Does negation really matter?
Ira Goldstein | Özlem Uzuner
Proceedings of the Workshop on Negation and Speculation in Natural Language Processing


pdf bib
Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text
Tawanda Sibanda | Ozlem Uzuner | Ozlem Uzuner
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference

pdf bib
Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text
Tawanda Sibanda | Ozlem Uzuner | Ozlem Uzuner
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference


pdf bib
A Comparative Study of Language Models for Book and Author Recognition
Özlem Uzuner | Boris Katz
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
Lexical Chains and Sliding Locality Windows in Content-based Text Similarity Detection
Thade Nahnsen | Özlem Uzuner | Boris Katz
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf bib
Using Syntactic Information to Identify Plagiarism
Özlem Uzuner | Boris Katz | Thade Nahnsen
Proceedings of the Second Workshop on Building Educational Applications Using NLP