Barbara Di Eugenio

Also published as: Barbara Di Eugenio

2025

pdf bib
Proceedings of the 31st International Conference on Computational Linguistics
Owen Rambow | Leo Wanner | Marianna Apidianaki | Hend Al-Khalifa | Barbara Di Eugenio | Steven Schockaert
Proceedings of the 31st International Conference on Computational Linguistics

pdf bib abs
Unveiling Performance Challenges of Large Language Models in Low-Resource Healthcare: A Demographic Fairness Perspective
Yue Zhou | Barbara Di Eugenio | Lu Cheng
Proceedings of the 31st International Conference on Computational Linguistics

This paper studies the performance of large language models (LLMs), particularly regarding demographic fairness, in solving real-world healthcare tasks. We evaluate state-of-the-art LLMs with three prevalent learning frameworks across six diverse healthcare tasks and find significant challenges in applying LLMs to real-world healthcare tasks and persistent fairness issues across demographic groups. We also find that explicitly providing demographic information yields mixed results, while LLM’s ability to infer such details raises concerns about biased health predictions. Utilizing LLMs as autonomous agents with access to up-to-date guidelines does not guarantee performance improvement. We believe these findings reveal the critical limitations of LLMs in healthcare fairness and the urgent need for specialized research in this area.

2024

We propose a dialogue system that enables heart failure patients to inquire about salt content in foods and help them monitor and reduce salt intake. Addressing the lack of specific datasets for food-based salt content inquiries, we develop a template-based conversational dataset. The dataset is structured to ask clarification questions to identify food items and their salt content. Our findings indicate that while fine-tuning transformer-based models on the dataset yields limited performance, the integration of Neuro-Symbolic Rules significantly enhances the system’s performance. Our experiments show that by integrating neuro-symbolic rules, our system achieves an improvement in joint goal accuracy of over 20% across different data sizes compared to naively fine-tuning transformer-based models.

pdf bib abs
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Yue Zhou | Henry Peng Zou | Barbara Di Eugenio | Yang Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious yet deceptively real procedure for the harmful behavior. Since a fallacious procedure is generally considered fake and thus harmless by LLMs, it helps bypass the safeguard mechanism. Yet the output is factually harmful since the LLM cannot fabricate fallacious solutions but proposes truthful ones. We evaluate our approach over five safety-aligned large language models, comparing four previous jailbreak methods, and show that our approach achieves competitive performance with more harmful outputs. We believe the findings could be extended beyond model safety, such as self-verification and hallucination.

pdf bib abs
CALAMR: Component ALignment for Abstract Meaning Representation
Paul Landes | Barbara Di Eugenio
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present Component ALignment for Abstract Meaning Representation (Calamr), a novel method for graph alignment that can support summarization and its evaluation. First, our method produces graphs that explain what is summarized through their alignments, which can be used to train graph based summarization learners. Second, although numerous scoring methods have been proposed for abstract meaning representation (AMR) that evaluate semantic similarity, no AMR based summarization metrics exist despite years of work using AMR for this task. Calamr provides alignments on which new scores can be based. The contributions of this work include a) a novel approach to aligning AMR graphs, b) a new summarization based scoring methods for similarity of AMR subgraphs composed of one or more sentences, and c) the entire reusable source code to reproduce our results.

pdf bib abs
Modeling Low-Resource Health Coaching Dialogues via Neuro-Symbolic Goal Summarization and Text-Units-Text Generation
Yue Zhou | Barbara Di Eugenio | Brian Ziebart | Lisa Sharp | Bing Liu | Nikolaos Agadakos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Health coaching helps patients achieve personalized and lifestyle-related goals, effectively managing chronic conditions and alleviating mental health issues. It is particularly beneficial, however cost-prohibitive, for low-socioeconomic status populations due to its highly personalized and labor-intensive nature. In this paper, we propose a neuro-symbolic goal summarizer to support health coaches in keeping track of the goals and a text-units-text dialogue generation model that converses with patients and helps them create and accomplish specific goals for physical activities. Our models outperform previous state-of-the-art while eliminating the need for predefined schema and corresponding annotation. We also propose a new health coaching dataset extending previous work and a metric to measure the unconventionality of the patient’s response based on data difficulty, facilitating potential coach alerts during deployment.

pdf bib abs
RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian
Krenare Pireva Nuci | Paul Landes | Barbara Di Eugenio
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The education domain has been a popular area of collaboration with NLP researchers for decades. However, many recent breakthroughs, such as large transformer based language models, have provided new opportunities for solving interesting, but difficult problems. One such problem is assigning sentiment to reviews of educators’ performance. We present EduSenti: a corpus of 1,163 Albanian and 624 English reviews of educational instructor’s performance reviews annotated for sentiment, emotion and educational topic. In this work, we experiment with fine-tuning several language models on the EduSenti corpus and then compare with an Albanian masked language trained model from the last XLM-RoBERTa checkpoint. We show promising results baseline results, which include an F1 of 71.9 in Albanian and 73.8 in English. Our contributions are: (i) a sentiment analysis corpus in Albanian and English, (ii) a large Albanian corpus of crawled data useful for unsupervised training of language models, and (iii) the source code for our experiments.

2023

pdf bib abs
Hospital Discharge Summarization Data Provenance
Paul Landes | Aaron Chaise | Kunal Patel | Sean Huang | Barbara Di Eugenio
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Summarization of medical notes has been studied for decades with hospital discharge summaries garnering recent interest in the research community. While methods for summarizing these notes have been the focus, there has been little work in understanding the feasibility of this task. We believe this effort is warranted given the notes’ length and complexity, and that they are often riddled with poorly formatted structured data and redundancy in copy and pasted text. In this work, we investigate the feasibility of the summarization task by finding the origin, or data provenance, of the discharge summary’s source text. As a motivation to understanding the data challenges of the summarization task, we present DSProv, a new dataset of 51 hospital admissions annotated by clinical informatics physicians. The dataset is analyzed for semantics and the extent of copied text from human authored electronic health record (EHR) notes. We also present a novel unsupervised method of matching notes used in discharge summaries, and release our annotation dataset1 and source code to the community.

pdf bib abs
DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility
Paul Landes | Barbara Di Eugenio | Cornelia Caragea
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

Given the criticality and difficulty of reproducing machine learning experiments, there have been significant efforts in reducing the variance of these results. The ability to consistently reproduce results effectively strengthens the underlying hypothesis of the work and should be regarded as important as the novel aspect of the research itself. The contribution of this work is an open source framework that has the following characteristics: a) facilitates reproducing consistent results, b) allows hot-swapping features and embeddings without further processing and re-vectorizing the dataset, c) provides a means of easily creating, training and evaluating natural language processing deep learning models with little to no code changes, and d) is freely available to the community.

pdf bib abs
Reference Resolution and New Entities in Exploratory Data Visualization: From Controlled to Unconstrained Interactions with a Conversational Assistant
Abari Bhattacharya | Abhinav Kumar | Barbara Di Eugenio | Roderick Tabalba | Jillian Aurisano | Veronica Grosso | Andrew Johnson | Jason Leigh | Moira Zellner
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In the context of data visualization, as in other grounded settings, referents are created by the task the agents engage in and are salient because they belong to the shared physical setting. Our focus is on resolving references to visualizations on large displays; crucially, reference resolution is directly involved in the process of creating new entities, namely new visualizations. First, we developed a reference resolution model for a conversational assistant. We trained the assistant on controlled dialogues for data visualizations involving a single user. Second, we ported the conversational assistant including its reference resolution model to a different domain, supporting two users collaborating on a data exploration task. We explore how the new setting affects reference detection and resolution; we compare the performance in the controlled vs unconstrained setting, and discuss the general lessons that we draw from this adaptation.

2022

Health coaching helps patients identify and accomplish lifestyle-related goals, effectively improving the control of chronic diseases and mitigating mental health conditions. However, health coaching is cost-prohibitive due to its highly personalized and labor-intensive nature. In this paper, we propose to build a dialogue system that converses with the patients, helps them create and accomplish specific goals, and can address their emotions with empathy. However, building such a system is challenging since real-world health coaching datasets are limited and empathy is subtle. Thus, we propose a modularized health coaching dialogue with simplified NLU and NLG frameworks combined with mechanism-conditioned empathetic response generation. Through automatic and human evaluation, we show that our system generates more empathetic, fluent, and coherent responses and outperforms the state-of-the-art in NLU tasks while requiring less annotation. We view our approach as a key step towards building automated and more accessible health coaching systems.

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are helpful to the reader when searching for information and contextualizing specific topics. The goal of this work is to segment the sections of clinical medical domain documentation. The primary contribution of this work is MedSecId, a publicly available set of 2,002 fully annotated medical notes from the MIMIC-III. We include several baselines, source code, a pretrained model and analysis of the data showing a relationship between medical concepts across sections using principal component analysis.

2021

pdf bib abs
Summarizing Behavioral Change Goals from SMS Exchanges to Support Health Coaches
Itika Gupta | Barbara Di Eugenio | Brian D. Ziebart | Bing Liu | Ben S. Gerber | Lisa K. Sharp
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Regular physical activity is associated with a reduced risk of chronic diseases such as type 2 diabetes and improved mental well-being. Yet, more than half of the US population is insufficiently active. Health coaching has been successful in promoting healthy behaviors. In this paper, we present our work towards assisting health coaches by extracting the physical activity goal the user and coach negotiate via text messages. We show that information captured by dialogue acts can help to improve the goal extraction results. We employ both traditional and transformer-based machine learning models for dialogue acts prediction and find them statistically indistinguishable in performance on our health coaching dataset. Moreover, we discuss the feedback provided by the health coaches when evaluating the correctness of the extracted goal summaries. This work is a step towards building a virtual assistant health coach to promote a healthy lifestyle.

2020

pdf bib abs
Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization
Abhinav Kumar | Barbara Di Eugenio | Jillian Aurisano | Andrew Johnson
Proceedings of the Twelfth Language Resources and Evaluation Conference

Our goal is to develop an intelligent assistant to support users explore data via visualizations. We have collected a new corpus of conversations, CHICAGO-CRIME-VIS, geared towards supporting data visualization exploration, and we have annotated it for a variety of features, including contextualized dialogue acts. In this paper, we describe our strategies and their evaluation for dialogue act classification. We highlight how thinking aloud affects interpretation of dialogue acts in our setting and how to best capture that information. A key component of our strategy is data augmentation as applied to the training data, since our corpus is inherently small. We ran experiments with the Balanced Bagging Classifier (BAGC), Condiontal Random Field (CRF), and several Long Short Term Memory (LSTM) networks, and found that all of them improved compared to the baseline (e.g., without the data augmentation pipeline). CRF outperformed the other classification algorithms, with the LSTM networks showing modest improvement, even after obtaining a performance boost from domain-trained word embeddings. This result is of note because training a CRF is far less resource-intensive than training deep learning models, hence given a similar if not better performance, traditional methods may still be preferable in order to lower resource consumption.

pdf bib abs
A Corpus for Visual Question Answering Annotated with Frame Semantic Information
Mehrdad Alizadeh | Barbara Di Eugenio
Proceedings of the Twelfth Language Resources and Evaluation Conference

Visual Question Answering (VQA) has been widely explored as a computer vision problem, however enhancing VQA systems with linguistic information is necessary for tackling the complexity of the task. The language understanding part can play a major role especially for questions asking about events or actions expressed via verbs. We hypothesize that if the question focuses on events described by verbs, then the model should be aware of or trained with verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. We created a new VQA dataset annotated with verb semantic information called imSituVQA. imSituVQA is built by taking advantage of the imSitu dataset annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet.

pdf bib abs
Heart Failure Education of African American and Hispanic/Latino Patients: Data Collection and Analysis
Itika Gupta | Barbara Di Eugenio | Devika Salunke | Andrew Boyd | Paula Allen-Meares | Carolyn Dickens | Olga Garcia
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

Heart failure is a global epidemic with debilitating effects. People with heart failure need to actively participate in home self-care regimens to maintain good health. However, these regimens are not as effective as they could be and are influenced by a variety of factors. Patients from minority communities like African American (AA) and Hispanic/Latino (H/L), often have poor outcomes compared to the average Caucasian population. In this paper, we lay the groundwork to develop an interactive dialogue agent that can assist AA and H/L patients in a culturally sensitive and linguistically accurate manner with their heart health care needs. This will be achieved by extracting relevant educational concepts from the interactions between health educators and patients. Thus far we have recorded and transcribed 20 such interactions. In this paper, we describe our data collection process, thematic and initiative analysis of the interactions, and outline our future steps.

pdf bib abs
Detecting and understanding moral biases in news
Usman Shahid | Barbara Di Eugenio | Andrew Rojecki | Elena Zheleva
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

We describe work in progress on detecting and understanding the moral biases of news sources by combining framing theory with natural language processing. First we draw connections between issue-specific frames and moral frames that apply to all issues. Then we analyze the connection between moral frame presence and news source political leaning. We develop and test a simple classification model for detecting the presence of a moral frame, highlighting the need for more sophisticated models. We also discuss some of the annotation and frame detection challenges that can inform future research in this area.

Our goal is to develop and deploy a virtual assistant health coach that can help patients set realistic physical activity goals and live a more active lifestyle. Since there is no publicly shared dataset of health coaching dialogues, the first phase of our research focused on data collection. We hired a certified health coach and 28 patients to collect the first round of human-human health coaching interaction which took place via text messages. This resulted in 2853 messages. The data collection phase was followed by conversation analysis to gain insight into the way information exchange takes place between a health coach and a patient. This was formalized using two annotation schemas: one that focuses on the goals the patient is setting and another that models the higher-level structure of the interactions. In this paper, we discuss these schemas and briefly talk about their application for automatically extracting activity goals and annotating the second round of data, collected with different health coaches and patients. Given the resource-intensive nature of data annotation, successfully annotating a new dataset automatically is key to answer the need for high quality, large datasets.

2019

Patients with chronic conditions like heart failure are the most likely to be re-hospitalized. One step towards avoiding re-hospitalization is to devise strategies for motivating patients to take care of their own health. In this paper, we perform a quantitative analysis of patients’ narratives of their experience with heart failure and explore the different topics that patients talk about. We compare two different groups of patients- those unable to take charge of their illness, and those who make efforts to improve their health. We will use the findings from our analysis to refine and personalize the summaries of hospitalizations that our system automatically generates.

2018

pdf bib abs
Towards Generating Personalized Hospitalization Summaries
Sabita Acharya | Barbara Di Eugenio | Andrew Boyd | Richard Cameron | Karen Dunn Lopez | Pamela Martyn-Nemeth | Carolyn Dickens | Amer Ardati
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Most of the health documents, including patient education materials and discharge notes, are usually flooded with medical jargons and contain a lot of generic information about the health issue. In addition, patients are only provided with the doctor’s perspective of what happened to them in the hospital while the care procedure performed by nurses during their entire hospital stay is nowhere included. The main focus of this research is to generate personalized hospital-stay summaries for patients by combining information from physician discharge notes and nursing plan of care. It uses a metric to identify medical concepts that are Complex, extracts definitions for the concept from three external knowledge sources, and provides the simplest definition to the patient. It also takes various features of the patient into account, like their concerns and strengths, ability to understand basic health information, level of engagement in taking care of their health, and familiarity with the health issue and personalizes the content of the summaries accordingly. Our evaluation showed that the summaries contain 80% of the medical concepts that are considered as being important by both doctor and nurses. Three patient advisors (i.e. individuals who are trained in understanding patient experience extensively) verified the usability of our summaries and mentioned that they would like to get such summaries when they are discharged from hospital.

2016

pdf bib
Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market
Rachel Harsley | Bhavesh Gupta | Barbara Di Eugenio | Huayi Li
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Towards a dialogue system that supports rich visualizations of data
Abhinav Kumar | Jillian Aurisano | Barbara Di Eugenio | Andrew Johnson | Alberto Gonzalez | Jason Leigh
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Generating summaries of hospitalizations: A new metric to assess the complexity of medical terms and their definitions
Sabita Acharya | Barbara Di Eugenio | Andrew D. Boyd | Karen Dunn Lopez | Richard Cameron | Gail M Keenan
Proceedings of the 9th International Natural Language Generation conference

2014

Aggregation of long lists of concepts is important to avoid overwhelming a small display. Focusing on the domain of mobile local search, this paper presents the development of an application to perform filtering and aggregation of results obtained through the Yahoo! Local web service. First, we performed an analysis of the data available through Yahoo! Local by crawling its database with over 170 thousand local listings located in Chicago. Then, we compiled resources and developed algorithms to filter and aggregate local search results. The methods developed exploit Yahoo!s listings categorization to reduce the result space and pinpoint the category containing the most relevant results. Finally, we evaluated a prototype through a user study, which pitted our system against Yahoo! Local and against a plain list of search results. The results obtained from the study show that our aggregation methods are quite effective, cutting down the number of entries returned to the user by 43% on average, but leaving search efficiency and user satisfaction unaffected.

pdf bib
KSC-PaL: A Peer Learning Agent that Encourages Students to take the Initiative
Cynthia Kersey | Barbara Di Eugenio | Pamela Jordan | Sandra Katz
Proceedings of the NAACL HLT 2010 Demonstration Session

pdf bib
Generating Fine-Grained Reviews of Songs from Album Reviews
Swati Tata | Barbara Di Eugenio
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Lucene and Maximum Entropy Model Based Hedge Detection System
Lin Chen | Barbara Di Eugenio
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

2009

pdf bib
An effective Discourse Parser that uses Rich Linguistic Information
Rajen Subba | Barbara Di Eugenio
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
KSC-PaL: A Peer Learning Agent that Encourages Students to take the Initiative
Cynthia Kersey | Barbara Di Eugenio | Pamela Jordan | Sandra Katz
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

2008

pdf bib abs
From Extracting to Abstracting: Generating Quasi-abstractive Summaries
Zhuli Xie | Barbara Di Eugenio | Peter C. Nelson
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we investigate quasi-abstractive summaries, a new type of machine-generated summaries that do not use whole sentences, but only fragments from the source. Quasi-abstractive summaries aim at bridging the gap between human-written abstracts and extractive summaries. We present an approach that learns how to identify sets of sentences, where each set contains fragments that can be used to produce one sentence in the abstract; and then uses these sets to produce the abstract itself. Our experiments show very promising results. Importantly, we obtain our best results when the summary generation is anchored by the most salient Noun Phrases predicted from the text to be summarized.

pdf bib abs
I saw TREE trees in the park: How to Correct Real-Word Spelling Mistakes
Davide Fossati | Barbara Di Eugenio
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a context sensitive spell checking system that uses mixed trigram models, and introduces a new empirically grounded method for building confusion sets. The proposed method has been implemented, tested, and evaluated in terms of coverage, precision, and recall. The results show that the method is effective.

pdf bib
Simple but effective feedback generation to tutor abstract problem solving
Xin Lu | Barbara Di Eugenio | Stellan Ohlsson | Davide Fossati
Proceedings of the Fifth International Natural Language Generation Conference

2006

pdf bib abs
Building lexical resources for PrincPar, a large coverage parser that generates principled semantic representations
Rajen Subba | Barbara Di Eugenio | Elena Terenzi
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Parsing, one of the more successful areas of Natural Language Processing has mostly been concerned with syntactic structure. Though uncovering the syntactic structure of sentences is very important, in many applications a meaningrepresentation for the input must be derived as well. We report on PrincPar, a parser that builds full meaning representations. It integrates LCFLEX, a robust parser, with alexicon and ontology derived from two lexical resources, VerbNet and CoreLex that represent the semantics of verbs and nouns respectively. We show that these two different lexical resources that focus on verbs and nouns can be successfully integrated. We report parsing results on a corpus of instructional text and assess the coverage of those lexical resources. Our evaluation metric is the number of verb frames that are assigned a correct semantics: 72.2% verb frames are assigned a perfect semantics, and another 10.9% are assigned a partially correctsemantics. Our ultimate goal is to develop a (semi)automatic method to derive domain knowledge from instructional text, in the form of linguistically motivated action schemes.

pdf bib
Discourse Parsing: Learning FOL Rules based on Rich Verb Semantic Representations to automatically label Rhetorical Relations
Rajen Subba | Barbara Di Eugenio | Su Nam Kim
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications