Barbara Di Eugenio


2024

pdf bib
Large Language Models Are Involuntary Truth-Tellers: Exploiting Fallacy Failure for Jailbreak Attacks
Yue Zhou | Henry Peng Zou | Barbara Di Eugenio | Yang Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

We find that language models have difficulties generating fallacious and deceptive reasoning. When asked to generate deceptive outputs, language models tend to leak honest counterparts but believe them to be false. Exploiting this deficiency, we propose a jailbreak attack method that elicits an aligned language model for malicious output. Specifically, we query the model to generate a fallacious yet deceptively real procedure for the harmful behavior. Since a fallacious procedure is generally considered fake and thus harmless by LLMs, it helps bypass the safeguard mechanism. Yet the output is factually harmful since the LLM cannot fabricate fallacious solutions but proposes truthful ones. We evaluate our approach over five safety-aligned large language models, comparing four previous jailbreak methods, and show that our approach achieves competitive performance with more harmful outputs. We believe the findings could be extended beyond model safety, such as self-verification and hallucination.

pdf bib
A Neuro-Symbolic Approach to Monitoring Salt Content in Food
Anuja Tayal | Barbara Di Eugenio | Devika Salunke | Andrew D. Boyd | Carolyn A. Dickens | Eulalia P. Abril | Olga Garcia-Bedoya | Paula G. Allen-Meares
Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024

We propose a dialogue system that enables heart failure patients to inquire about salt content in foods and help them monitor and reduce salt intake. Addressing the lack of specific datasets for food-based salt content inquiries, we develop a template-based conversational dataset. The dataset is structured to ask clarification questions to identify food items and their salt content. Our findings indicate that while fine-tuning transformer-based models on the dataset yields limited performance, the integration of Neuro-Symbolic Rules significantly enhances the system’s performance. Our experiments show that by integrating neuro-symbolic rules, our system achieves an improvement in joint goal accuracy of over 20% across different data sizes compared to naively fine-tuning transformer-based models.

pdf bib
CALAMR: Component ALignment for Abstract Meaning Representation
Paul Landes | Barbara Di Eugenio
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

We present Component ALignment for Abstract Meaning Representation (Calamr), a novel method for graph alignment that can support summarization and its evaluation. First, our method produces graphs that explain what is summarized through their alignments, which can be used to train graph based summarization learners. Second, although numerous scoring methods have been proposed for abstract meaning representation (AMR) that evaluate semantic similarity, no AMR based summarization metrics exist despite years of work using AMR for this task. Calamr provides alignments on which new scores can be based. The contributions of this work include a) a novel approach to aligning AMR graphs, b) a new summarization based scoring methods for similarity of AMR subgraphs composed of one or more sentences, and c) the entire reusable source code to reproduce our results.

pdf bib
Modeling Low-Resource Health Coaching Dialogues via Neuro-Symbolic Goal Summarization and Text-Units-Text Generation
Yue Zhou | Barbara Di Eugenio | Brian Ziebart | Lisa Sharp | Bing Liu | Nikolaos Agadakos
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Health coaching helps patients achieve personalized and lifestyle-related goals, effectively managing chronic conditions and alleviating mental health issues. It is particularly beneficial, however cost-prohibitive, for low-socioeconomic status populations due to its highly personalized and labor-intensive nature. In this paper, we propose a neuro-symbolic goal summarizer to support health coaches in keeping track of the goals and a text-units-text dialogue generation model that converses with patients and helps them create and accomplish specific goals for physical activities. Our models outperform previous state-of-the-art while eliminating the need for predefined schema and corresponding annotation. We also propose a new health coaching dataset extending previous work and a metric to measure the unconventionality of the patient’s response based on data difficulty, facilitating potential coach alerts during deployment.

pdf bib
RoBERTa Low Resource Fine Tuning for Sentiment Analysis in Albanian
Krenare Pireva Nuci | Paul Landes | Barbara Di Eugenio
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The education domain has been a popular area of collaboration with NLP researchers for decades. However, many recent breakthroughs, such as large transformer based language models, have provided new opportunities for solving interesting, but difficult problems. One such problem is assigning sentiment to reviews of educators’ performance. We present EduSenti: a corpus of 1,163 Albanian and 624 English reviews of educational instructor’s performance reviews annotated for sentiment, emotion and educational topic. In this work, we experiment with fine-tuning several language models on the EduSenti corpus and then compare with an Albanian masked language trained model from the last XLM-RoBERTa checkpoint. We show promising results baseline results, which include an F1 of 71.9 in Albanian and 73.8 in English. Our contributions are: (i) a sentiment analysis corpus in Albanian and English, (ii) a large Albanian corpus of crawled data useful for unsupervised training of language models, and (iii) the source code for our experiments.

2023

pdf bib
Hospital Discharge Summarization Data Provenance
Paul Landes | Aaron Chaise | Kunal Patel | Sean Huang | Barbara Di Eugenio
The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks

Summarization of medical notes has been studied for decades with hospital discharge summaries garnering recent interest in the research community. While methods for summarizing these notes have been the focus, there has been little work in understanding the feasibility of this task. We believe this effort is warranted given the notes’ length and complexity, and that they are often riddled with poorly formatted structured data and redundancy in copy and pasted text. In this work, we investigate the feasibility of the summarization task by finding the origin, or data provenance, of the discharge summary’s source text. As a motivation to understanding the data challenges of the summarization task, we present DSProv, a new dataset of 51 hospital admissions annotated by clinical informatics physicians. The dataset is analyzed for semantics and the extent of copied text from human authored electronic health record (EHR) notes. We also present a novel unsupervised method of matching notes used in discharge summaries, and release our annotation dataset1 and source code to the community.

pdf bib
Reference Resolution and New Entities in Exploratory Data Visualization: From Controlled to Unconstrained Interactions with a Conversational Assistant
Abari Bhattacharya | Abhinav Kumar | Barbara Di Eugenio | Roderick Tabalba | Jillian Aurisano | Veronica Grosso | Andrew Johnson | Jason Leigh | Moira Zellner
Proceedings of the 24th Annual Meeting of the Special Interest Group on Discourse and Dialogue

In the context of data visualization, as in other grounded settings, referents are created by the task the agents engage in and are salient because they belong to the shared physical setting. Our focus is on resolving references to visualizations on large displays; crucially, reference resolution is directly involved in the process of creating new entities, namely new visualizations. First, we developed a reference resolution model for a conversational assistant. We trained the assistant on controlled dialogues for data visualizations involving a single user. Second, we ported the conversational assistant including its reference resolution model to a different domain, supporting two users collaborating on a data exploration task. We explore how the new setting affects reference detection and resolution; we compare the performance in the controlled vs unconstrained setting, and discuss the general lessons that we draw from this adaptation.

pdf bib
DeepZensols: A Deep Learning Natural Language Processing Framework for Experimentation and Reproducibility
Paul Landes | Barbara Di Eugenio | Cornelia Caragea
Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)

Given the criticality and difficulty of reproducing machine learning experiments, there have been significant efforts in reducing the variance of these results. The ability to consistently reproduce results effectively strengthens the underlying hypothesis of the work and should be regarded as important as the novel aspect of the research itself. The contribution of this work is an open source framework that has the following characteristics: a) facilitates reproducing consistent results, b) allows hot-swapping features and embeddings without further processing and re-vectorizing the dataset, c) provides a means of easily creating, training and evaluating natural language processing deep learning models with little to no code changes, and d) is freely available to the community.

2022

pdf bib
Towards Enhancing Health Coaching Dialogue in Low-Resource Settings
Yue Zhou | Barbara Di Eugenio | Brian Ziebart | Lisa Sharp | Bing Liu | Ben Gerber | Nikolaos Agadakos | Shweta Yadav
Proceedings of the 29th International Conference on Computational Linguistics

Health coaching helps patients identify and accomplish lifestyle-related goals, effectively improving the control of chronic diseases and mitigating mental health conditions. However, health coaching is cost-prohibitive due to its highly personalized and labor-intensive nature. In this paper, we propose to build a dialogue system that converses with the patients, helps them create and accomplish specific goals, and can address their emotions with empathy. However, building such a system is challenging since real-world health coaching datasets are limited and empathy is subtle. Thus, we propose a modularized health coaching dialogue with simplified NLU and NLG frameworks combined with mechanism-conditioned empathetic response generation. Through automatic and human evaluation, we show that our system generates more empathetic, fluent, and coherent responses and outperforms the state-of-the-art in NLU tasks while requiring less annotation. We view our approach as a key step towards building automated and more accessible health coaching systems.

pdf bib
A New Public Corpus for Clinical Section Identification: MedSecId
Paul Landes | Kunal Patel | Sean S. Huang | Adam Webb | Barbara Di Eugenio | Cornelia Caragea
Proceedings of the 29th International Conference on Computational Linguistics

The process by which sections in a document are demarcated and labeled is known as section identification. Such sections are helpful to the reader when searching for information and contextualizing specific topics. The goal of this work is to segment the sections of clinical medical domain documentation. The primary contribution of this work is MedSecId, a publicly available set of 2,002 fully annotated medical notes from the MIMIC-III. We include several baselines, source code, a pretrained model and analysis of the data showing a relationship between medical concepts across sections using principal component analysis.

2021

pdf bib
Summarizing Behavioral Change Goals from SMS Exchanges to Support Health Coaches
Itika Gupta | Barbara Di Eugenio | Brian D. Ziebart | Bing Liu | Ben S. Gerber | Lisa K. Sharp
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Regular physical activity is associated with a reduced risk of chronic diseases such as type 2 diabetes and improved mental well-being. Yet, more than half of the US population is insufficiently active. Health coaching has been successful in promoting healthy behaviors. In this paper, we present our work towards assisting health coaches by extracting the physical activity goal the user and coach negotiate via text messages. We show that information captured by dialogue acts can help to improve the goal extraction results. We employ both traditional and transformer-based machine learning models for dialogue acts prediction and find them statistically indistinguishable in performance on our health coaching dataset. Moreover, we discuss the feedback provided by the health coaches when evaluating the correctness of the extracted goal summaries. This work is a step towards building a virtual assistant health coach to promote a healthy lifestyle.

2020

pdf bib
Augmenting Small Data to Classify Contextualized Dialogue Acts for Exploratory Visualization
Abhinav Kumar | Barbara Di Eugenio | Jillian Aurisano | Andrew Johnson
Proceedings of the Twelfth Language Resources and Evaluation Conference

Our goal is to develop an intelligent assistant to support users explore data via visualizations. We have collected a new corpus of conversations, CHICAGO-CRIME-VIS, geared towards supporting data visualization exploration, and we have annotated it for a variety of features, including contextualized dialogue acts. In this paper, we describe our strategies and their evaluation for dialogue act classification. We highlight how thinking aloud affects interpretation of dialogue acts in our setting and how to best capture that information. A key component of our strategy is data augmentation as applied to the training data, since our corpus is inherently small. We ran experiments with the Balanced Bagging Classifier (BAGC), Condiontal Random Field (CRF), and several Long Short Term Memory (LSTM) networks, and found that all of them improved compared to the baseline (e.g., without the data augmentation pipeline). CRF outperformed the other classification algorithms, with the LSTM networks showing modest improvement, even after obtaining a performance boost from domain-trained word embeddings. This result is of note because training a CRF is far less resource-intensive than training deep learning models, hence given a similar if not better performance, traditional methods may still be preferable in order to lower resource consumption.

pdf bib
A Corpus for Visual Question Answering Annotated with Frame Semantic Information
Mehrdad Alizadeh | Barbara Di Eugenio
Proceedings of the Twelfth Language Resources and Evaluation Conference

Visual Question Answering (VQA) has been widely explored as a computer vision problem, however enhancing VQA systems with linguistic information is necessary for tackling the complexity of the task. The language understanding part can play a major role especially for questions asking about events or actions expressed via verbs. We hypothesize that if the question focuses on events described by verbs, then the model should be aware of or trained with verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. We created a new VQA dataset annotated with verb semantic information called imSituVQA. imSituVQA is built by taking advantage of the imSitu dataset annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet.

pdf bib
Human-Human Health Coaching via Text Messages: Corpus, Annotation, and Analysis
Itika Gupta | Barbara Di Eugenio | Brian Ziebart | Aiswarya Baiju | Bing Liu | Ben Gerber | Lisa Sharp | Nadia Nabulsi | Mary Smart
Proceedings of the 21th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Our goal is to develop and deploy a virtual assistant health coach that can help patients set realistic physical activity goals and live a more active lifestyle. Since there is no publicly shared dataset of health coaching dialogues, the first phase of our research focused on data collection. We hired a certified health coach and 28 patients to collect the first round of human-human health coaching interaction which took place via text messages. This resulted in 2853 messages. The data collection phase was followed by conversation analysis to gain insight into the way information exchange takes place between a health coach and a patient. This was formalized using two annotation schemas: one that focuses on the goals the patient is setting and another that models the higher-level structure of the interactions. In this paper, we discuss these schemas and briefly talk about their application for automatically extracting activity goals and annotating the second round of data, collected with different health coaches and patients. Given the resource-intensive nature of data annotation, successfully annotating a new dataset automatically is key to answer the need for high quality, large datasets.

pdf bib
Heart Failure Education of African American and Hispanic/Latino Patients: Data Collection and Analysis
Itika Gupta | Barbara Di Eugenio | Devika Salunke | Andrew Boyd | Paula Allen-Meares | Carolyn Dickens | Olga Garcia
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations

Heart failure is a global epidemic with debilitating effects. People with heart failure need to actively participate in home self-care regimens to maintain good health. However, these regimens are not as effective as they could be and are influenced by a variety of factors. Patients from minority communities like African American (AA) and Hispanic/Latino (H/L), often have poor outcomes compared to the average Caucasian population. In this paper, we lay the groundwork to develop an interactive dialogue agent that can assist AA and H/L patients in a culturally sensitive and linguistically accurate manner with their heart health care needs. This will be achieved by extracting relevant educational concepts from the interactions between health educators and patients. Thus far we have recorded and transcribed 20 such interactions. In this paper, we describe our data collection process, thematic and initiative analysis of the interactions, and outline our future steps.

pdf bib
Detecting and understanding moral biases in news
Usman Shahid | Barbara Di Eugenio | Andrew Rojecki | Elena Zheleva
Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events

We describe work in progress on detecting and understanding the moral biases of news sources by combining framing theory with natural language processing. First we draw connections between issue-specific frames and moral frames that apply to all issues. Then we analyze the connection between moral frame presence and news source political leaning. We develop and test a simple classification model for detecting the presence of a moral frame, highlighting the need for more sophisticated models. We also discuss some of the annotation and frame detection challenges that can inform future research in this area.

2019

pdf bib
A Quantitative Analysis of Patients’ Narratives of Heart Failure
Sabita Acharya | Barbara Di Eugenio | Andrew Boyd | Richard Cameron | Karen Dunn Lopez | Pamela Martyn-Nemeth | Debaleena Chattopadhyay | Pantea Habibi | Carolyn Dickens | Haleh Vatani | Amer Ardati
Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue

Patients with chronic conditions like heart failure are the most likely to be re-hospitalized. One step towards avoiding re-hospitalization is to devise strategies for motivating patients to take care of their own health. In this paper, we perform a quantitative analysis of patients’ narratives of their experience with heart failure and explore the different topics that patients talk about. We compare two different groups of patients- those unable to take charge of their illness, and those who make efforts to improve their health. We will use the findings from our analysis to refine and personalize the summaries of hospitalizations that our system automatically generates.

2018

pdf bib
Towards Generating Personalized Hospitalization Summaries
Sabita Acharya | Barbara Di Eugenio | Andrew Boyd | Richard Cameron | Karen Dunn Lopez | Pamela Martyn-Nemeth | Carolyn Dickens | Amer Ardati
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

Most of the health documents, including patient education materials and discharge notes, are usually flooded with medical jargons and contain a lot of generic information about the health issue. In addition, patients are only provided with the doctor’s perspective of what happened to them in the hospital while the care procedure performed by nurses during their entire hospital stay is nowhere included. The main focus of this research is to generate personalized hospital-stay summaries for patients by combining information from physician discharge notes and nursing plan of care. It uses a metric to identify medical concepts that are Complex, extracts definitions for the concept from three external knowledge sources, and provides the simplest definition to the patient. It also takes various features of the patient into account, like their concerns and strengths, ability to understand basic health information, level of engagement in taking care of their health, and familiarity with the health issue and personalizes the content of the summaries accordingly. Our evaluation showed that the summaries contain 80% of the medical concepts that are considered as being important by both doctor and nurses. Three patient advisors (i.e. individuals who are trained in understanding patient experience extensively) verified the usability of our summaries and mentioned that they would like to get such summaries when they are discharged from hospital.

2016

pdf bib
Hit Songs’ Sentiments Harness Public Mood & Predict Stock Market
Rachel Harsley | Bhavesh Gupta | Barbara Di Eugenio | Huayi Li
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib
Towards a dialogue system that supports rich visualizations of data
Abhinav Kumar | Jillian Aurisano | Barbara Di Eugenio | Andrew Johnson | Alberto Gonzalez | Jason Leigh
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Generating summaries of hospitalizations: A new metric to assess the complexity of medical terms and their definitions
Sabita Acharya | Barbara Di Eugenio | Andrew D. Boyd | Karen Dunn Lopez | Richard Cameron | Gail M Keenan
Proceedings of the 9th International Natural Language Generation conference

2014

pdf bib
PatientNarr: Towards generating patient-centric summaries of hospital stays
Barbara Di Eugenio | Andrew Boyd | Camillo Lugaresi | Abhinaya Balasubramanian | Gail Keenan | Mike Burton | Tamara Goncalves Rezende Macieira | Jianrong Li | Yves Lussier | Yves Lussier
Proceedings of the 8th International Natural Language Generation Conference (INLG)

2013

pdf bib
Translating Italian connectives into Italian Sign Language
Camillo Lugaresi | Barbara Di Eugenio
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
UIC-CSC: The Content Selection Challenge Entry from the University of Illinois at Chicago
Hareen Venigalla | Barbara Di Eugenio
Proceedings of the 14th European Workshop on Natural Language Generation

pdf bib
Proceedings of the SIGDIAL 2013 Conference
Maxine Eskenazi | Michael Strube | Barbara Di Eugenio | Jason D. Williams
Proceedings of the SIGDIAL 2013 Conference

pdf bib
Multimodality and Dialogue Act Classification in the RoboHelper Project
Lin Chen | Barbara Di Eugenio
Proceedings of the SIGDIAL 2013 Conference

2012

pdf bib
Co-reference via Pointing and Haptics in Multi-Modal Dialogues
Lin Chen | Barbara Di Eugenio
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference
Barbara Di Eugenio | Susan McRoy
INLG 2012 Proceedings of the Seventh International Natural Language Generation Conference

pdf bib
Improving Sentence Completion in Dialogues with Multi-Modal Features
Anruo Wang | Barbara Di Eugenio | Lin Chen
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2011

pdf bib
Exploring Effective Dialogue Act Sequences in One-on-one Computer Science Tutoring Dialogues
Lin Chen | Barbara Di Eugenio | Davide Fossati | Stellan Ohlsson | David Cosejo
Proceedings of the Sixth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Improving Pronominal and Deictic Co-Reference Resolution with Multi-Modal Features
Lin Chen | Anruo Wang | Barbara Di Eugenio
Proceedings of the SIGDIAL 2011 Conference

2010

pdf bib
Generating Fine-Grained Reviews of Songs from Album Reviews
Swati Tata | Barbara Di Eugenio
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
KSC-PaL: A Peer Learning Agent that Encourages Students to take the Initiative
Cynthia Kersey | Barbara Di Eugenio | Pamela Jordan | Sandra Katz
Proceedings of the NAACL HLT 2010 Demonstration Session

pdf bib
A Lucene and Maximum Entropy Model Based Hedge Detection System
Lin Chen | Barbara Di Eugenio
Proceedings of the Fourteenth Conference on Computational Natural Language Learning – Shared Task

pdf bib
Analysis and Presentation of Results for Mobile Local Search
Alberto Tretti | Barbara Di Eugenio
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Aggregation of long lists of concepts is important to avoid overwhelming a small display. Focusing on the domain of mobile local search, this paper presents the development of an application to perform filtering and aggregation of results obtained through the Yahoo! Local web service. First, we performed an analysis of the data available through Yahoo! Local by crawling its database with over 170 thousand local listings located in Chicago. Then, we compiled resources and developed algorithms to filter and aggregate local search results. The methods developed exploit Yahoo!’s listings categorization to reduce the result space and pinpoint the category containing the most relevant results. Finally, we evaluated a prototype through a user study, which pitted our system against Yahoo! Local and against a plain list of search results. The results obtained from the study show that our aggregation methods are quite effective, cutting down the number of entries returned to the user by 43% on average, but leaving search efficiency and user satisfaction unaffected.

2009

pdf bib
An effective Discourse Parser that uses Rich Linguistic Information
Rajen Subba | Barbara Di Eugenio
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
KSC-PaL: A Peer Learning Agent that Encourages Students to take the Initiative
Cynthia Kersey | Barbara Di Eugenio | Pamela Jordan | Sandra Katz
Proceedings of the Fourth Workshop on Innovative Use of NLP for Building Educational Applications

2008

pdf bib
Simple but effective feedback generation to tutor abstract problem solving
Xin Lu | Barbara Di Eugenio | Stellan Ohlsson | Davide Fossati
Proceedings of the Fifth International Natural Language Generation Conference

pdf bib
From Extracting to Abstracting: Generating Quasi-abstractive Summaries
Zhuli Xie | Barbara Di Eugenio | Peter C. Nelson
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we investigate quasi-abstractive summaries, a new type of machine-generated summaries that do not use whole sentences, but only fragments from the source. Quasi-abstractive summaries aim at bridging the gap between human-written abstracts and extractive summaries. We present an approach that learns how to identify sets of sentences, where each set contains fragments that can be used to produce one sentence in the abstract; and then uses these sets to produce the abstract itself. Our experiments show very promising results. Importantly, we obtain our best results when the summary generation is anchored by the most salient Noun Phrases predicted from the text to be summarized.

pdf bib
I saw TREE trees in the park: How to Correct Real-Word Spelling Mistakes
Davide Fossati | Barbara Di Eugenio
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents a context sensitive spell checking system that uses mixed trigram models, and introduces a new empirically grounded method for building confusion sets. The proposed method has been implemented, tested, and evaluated in terms of coverage, precision, and recall. The results show that the method is effective.

2006

pdf bib
Building lexical resources for PrincPar, a large coverage parser that generates principled semantic representations
Rajen Subba | Barbara Di Eugenio | Elena Terenzi
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Parsing, one of the more successful areas of Natural Language Processing has mostly been concerned with syntactic structure. Though uncovering the syntactic structure of sentences is very important, in many applications a meaningrepresentation for the input must be derived as well. We report on PrincPar, a parser that builds full meaning representations. It integrates LCFLEX, a robust parser, with alexicon and ontology derived from two lexical resources, VerbNet and CoreLex that represent the semantics of verbs and nouns respectively. We show that these two different lexical resources that focus on verbs and nouns can be successfully integrated. We report parsing results on a corpus of instructional text and assess the coverage of those lexical resources. Our evaluation metric is the number of verb frames that are assigned a correct semantics: 72.2% verb frames are assigned a perfect semantics, and another 10.9% are assigned a partially correctsemantics. Our ultimate goal is to develop a (semi)automatic method to derive domain knowledge from instructional text, in the form of linguistically motivated action schemes.

pdf bib
The problem of ontology alignment on the Web: A first report
Davide Fossati | Gabriele Ghidoni | Barbara Di Eugenio | Isabel Cruz | Huiyong Xiao | Rajen Subba
Proceedings of the 2nd International Workshop on Web as Corpus

pdf bib
Discourse Parsing: Learning FOL Rules based on Rich Verb Semantic Representations to automatically label Rhetorical Relations
Rajen Subba | Barbara Di Eugenio | Su Nam Kim
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications

2005

pdf bib
Aggregation Improves Learning: Experiments in Natural Language Generation for Intelligent Tutoring Systems
Barbara Di Eugenio | Davide Fossati | Dan Yu | Susan Haller | Michael Glass
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

2004

pdf bib
FLSA: Extending Latent Semantic Analysis with Features for Dialogue Act Classification
Riccardo Serafin | Barbara Di Eugenio
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Using Gene Expression Programming to Construct Sentence Ranking Functions for Text Summarization
Zhuli Xie | Xin Li | Barbara Di Eugenio | Weimin Xiao | Thomas M. Tirpak | Peter C. Nelson
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
Squibs and Discussions: The Kappa Statistic: A Second Look
Barbara Di Eugenio | Michael Glass
Computational Linguistics, Volume 30, Number 1, March 2004

pdf bib
Centering: A Parametric Theory and Its Instantiations
Massimo Poesio | Rosemary Stevenson | Barbara Di Eugenio | Janet Hitzeman
Computational Linguistics, Volume 30, Number 3, September 2004

2003

pdf bib
Latent Semantic Analysis for Dialogue Act Classification
Riccardo Serafin | Barbara Di Eugenio | Michael Glass
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf bib
Building lexical semantic representations for Natural Language instructions
Elena Terenzi | Barbara Di Eugenio
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

2002

pdf bib
MUP - The UIC Standoff Markup Tool
Michael Glass | Barbara Di Eugenio
Proceedings of the Third SIGdial Workshop on Discourse and Dialogue

pdf bib
The DIAG experiments: Natural Language Generation for Intelligent Tutoring Systems
Barbara Di Eugenio | Michael Glass | Michael Trolio
Proceedings of the International Natural Language Generation Conference

pdf bib
The binomial cumulative distribution function, or, is my system better than yours?
Barbara Di Eugenio | Michael Glass | Michael J. Scott
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
On the Usage of Kappa to Evaluate Agreement on Coding Tasks
Barbara Di Eugenio
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Book Reviews: Lexical Semantics and Knowledge Representation in Multilingual Text Generation
Barbara Di Eugenio
Computational Linguistics, Volume 26, Number 2, June 2000

1998

pdf bib
Introduction to the Special Issue on Natural Language Generation
Robert Dale | Barbara Di Eugenio | Donia Scott
Computational-Linguistics, Volume 24, Number 3, September 1998

pdf bib
An Empirical Investigation of Proposals in Collaborative Dialogues
Barbara Di Eugenio | Pamela W. Jordan | Johanna D. Moore | Richmond H. Thomason
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
An Empirical Investigation of Proposals in Collaborative Dialogues
Barbara Di Eugenio | Pamela W. Jordan | Johanna D. Moore | Richmond H. Thomason
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

1997

pdf bib
Learning Features that Predict Cue Usage
Barbara Di Eugenio | Johanna D. Moore | Massimo Paolucci
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1996

pdf bib
Learning Micro-Planning Rules for Preventive Expressions
Keith Vander Linden | Barbara Di Eugenio
Eighth International Natural Language Generation Workshop

pdf bib
Generating ‘Distributed’ Referring Expressions: an Initial Report
Barbara Di Eugenio | Johanna D. Moore
Eighth International Natural Language Generation Workshop (Posters and Demonstrations)

pdf bib
A Corpus Study of Negative Imperatives in Natural Language Instructions
Keith Vander Linden | Barbara Di Eugenio
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib
The discourse functions of Italian subjects: a centering approach
Barbara Di Eugenio
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

pdf bib
Using Discourse Predictions for Ambiguity Resolution
Yan Qu | Carolyn P. Rose | Barbara Di Eugenio
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

1995

pdf bib
Discourse Processing of Dialogues with Multiple Threads
Carolyn Penstein Rosé | Barbara Di Eugenio | Lori S. Levin | Carol Van Ess-Dykema
33rd Annual Meeting of the Association for Computational Linguistics

1993

pdf bib
Speaker’s Intentions and Beliefs in Negative Imperatives
Barbara Di Eugenio
Intentionality and Structure in Discourse Relations

1992

pdf bib
On the Interpretation of Natural Language Instructions
Barbara Di Eugenio | Michael White
COLING 1992 Volume 4: The 14th International Conference on Computational Linguistics

pdf bib
Understanding Natural Language Instructions: The Case of Purpose Clauses
Barbara Di Eugenio
30th Annual Meeting of the Association for Computational Linguistics

1991

pdf bib
Action representation for NL instructions
Barbara Di Eugenio
29th Annual Meeting of the Association for Computational Linguistics

1990

pdf bib
Centering theory and the Italian pronominal system
Barbara Di Eugenio
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

pdf bib
Free Adjuncts in Natural Language Instructions
Bonnie Lynn Webber | Barbara Di Eugenio
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

1986

pdf bib
A Logical Formalism for the Representation of Determiners
Barbara Di Eugenio | Leonardo Lesmo | Paolo Pogliano | Pietro Torasso | Francesco Urbano
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics

Search
Co-authors