Roxana Girju

Also published as: Roxana Gîrju


2023

pdf bib
Investigating Stylistic Profiles for the Task of Empathy Classification in Medical Narrative Essays
Priyanka Dey | Roxana Girju
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)

One important aspect of language is how speakers generate utterances and texts to convey their intended meanings. In this paper, we bring various aspects of the Construction Grammar (CxG) and the Systemic Functional Grammar (SFG) theories in a deep learning computational framework to model empathic language. Our corpus consists of 440 essays written by premed students as narrated simulated patient–doctor interactions. We start with baseline classifiers (state-of-the-art recurrent neural networks and transformer models). Then, we enrich these models with a set of linguistic constructions proving the importance of this novel approach to the task of empathy classification for this dataset. Our results indicate the potential of such constructions to contribute to the overall empathy profile of first-person narrative essays.

2022

pdf bib
Syn2Vec: Synset Colexification Graphs for Lexical Semantic Similarity
John Harvill | Roxana Girju | Mark Hasegawa-Johnson
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

In this paper we focus on patterns of colexification (co-expressions of form-meaning mapping in the lexicon) as an aspect of lexical-semantic organization, and use them to build large scale synset graphs across BabelNet’s typologically diverse set of 499 world languages. We introduce and compare several approaches: monolingual and cross-lingual colexification graphs, popular distributional models, and fusion approaches. The models are evaluated against human judgments on a semantic similarity task for nine languages. Our strong empirical findings also point to the importance of universality of our graph synset embedding representations with no need for any language-specific adaptation when evaluated on the lexical similarity task. The insights of our exploratory investigation of large-scale colexification graphs could inspire significant advances in NLP across languages, especially for tasks involving languages which lack dedicated lexical resources, and can benefit from language transfer from large shared cross-lingual semantic spaces.

pdf bib
Design Considerations for an NLP-Driven Empathy and Emotion Interface for Clinician Training via Telemedicine
Roxana Girju | Marina Girju
Proceedings of the Second Workshop on Bridging Human--Computer Interaction and Natural Language Processing

As digital social platforms and mobile technologies become more prevalent and robust, the use of Artificial Intelligence (AI) in facilitating human communication will grow. This, in turn, will encourage development of intuitive, adaptive, and effective empathic AI interfaces that better address the needs of socially and culturally diverse communities. In this paper, we present several design considerations of an intelligent digital interface intended to guide the clinicians toward more empathetic communication. This approach allows various communities of practice to investigate how AI, on one side, and human communication and healthcare needs, on the other, can contribute to each other’s development.

pdf bib
Enriching Deep Learning with Frame Semantics for Empathy Classification in Medical Narrative Essays
Priyanka Dey | Roxana Girju
Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)

Empathy is a vital component of health care and plays a key role in the training of future doctors. Paying attention to medical students’ self-reflective stories of their interactions with patients can encourage empathy and the formation of professional identities that embody desirable values such as integrity and respect. We present a computational approach and linguistic analysis of empathic language in a large corpus of 440 essays written by pre-med students as narrated simulated patient – doctor interactions. We analyze the discourse of three kinds of empathy: cognitive, affective, and prosocial as highlighted by expert annotators. We also present various experiments with state-of-the-art recurrent neural networks and transformer models for classifying these forms of empathy. To further improve over these results, we develop a novel system architecture that makes use of frame semantics to enrich our state-of-the-art models. We show that this novel framework leads to significant improvement on the empathy classification task for this dataset.

2018

pdf bib
UIUC at SemEval-2018 Task 1: Recognizing Affect with Ensemble Models
Abhishek Avinash Narwekar | Roxana Girju
Proceedings of the 12th International Workshop on Semantic Evaluation

Our submission to the SemEval-2018 Task1: Affect in Tweets shared task competition is a supervised learning model relying on standard lexicon features coupled with word embedding features. We used an ensemble of diverse models, including random forests, gradient boosted trees, and linear models, corrected for training-development set mismatch. We submitted the system’s output for subtasks 1 (emotion intensity prediction), 2 (emotion ordinal classification), 3 (valence intensity regression) and 4 (valence ordinal classification), for English tweets. We placed 25th, 19th, 24th and 15th in the four subtasks respectively. The baseline considered was an SVM (Support Vector Machines) model with linear kernel on the lexicon and embedding based features. Our system’s final performance measured in Pearson correlation scores outperformed the baseline by a margin of 2.2% to 14.6% across all tasks.

2016

pdf bib
Psycholinguistic Features for Deceptive Role Detection in Werewolf
Codruta Girlea | Roxana Girju | Eyal Amir
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf bib
Detecting Causally Embedded Structures Using an Evolutionary Algorithm
Chen Li | Roxana Girju
Proceedings of the 3rd Workshop on EVENTS: Definition, Detection, Coreference, and Representation

2014

pdf bib
Recognizing Causality in Verb-Noun Pairs via Noun and Verb Semantics
Mehwish Riaz | Roxana Girju
Proceedings of the EACL 2014 Workshop on Computational Approaches to Causality in Language (CAtoCL)

pdf bib
Unsupervised Construction of a Lexicon and a Repository of Variation Patterns for Arabic Modal Multiword Expressions
Rania Al-Sabbagh | Roxana Girju | Jana Diesner
Proceedings of the 10th Workshop on Multiword Expressions (MWE)

pdf bib
In-depth Exploitation of Noun and Verb Semantics to Identify Causation in Verb-Noun Pairs
Mehwish Riaz | Roxana Girju
Proceedings of the 15th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL)

pdf bib
Interactive Annotation for Event Modality in Modern Standard and Egyptian Arabic Tweets
Rania Al-Sabbagh | Roxana Girju | Jana Diesner
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

pdf bib
3arif: A Corpus of Modern Standard and Egyptian Arabic Tweets Annotated for Epistemic Modality Using Interactive Crowdsourcing
Rania Al-Sabbagh | Roxana Girju | Jana Diesner
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Toward a Better Understanding of Causality between Verbal Events: Extraction and Analysis of the Causal Power of Verb-Verb Associations
Mehwish Riaz | Roxana Girju
Proceedings of the SIGDIAL 2013 Conference

pdf bib
Using the Semantic-Syntactic Interface for Reliable Arabic Modality Annotation
Rania Al-Sabbagh | Jana Diesner | Roxana Girju
Proceedings of the Sixth International Joint Conference on Natural Language Processing

2012

pdf bib
YADAC: Yet another Dialectal Arabic Corpus
Rania Al-Sabbagh | Roxana Girju
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents the first phase of building YADAC ― a multi-genre Dialectal Arabic (DA) corpus ― that is compiled using Web data from microblogs (i.e. Twitter), blogs/forums and online knowledge market services in which both questions and answers are user-generated. In addition to introducing two new genres to the current efforts of building DA corpora (i.e. microblogs and question-answer pairs extracted from online knowledge market services), the paper highlights and tackles several new issues related to building DA corpora that have not been handled in previous studies: function-based Web harvesting and dialect identification, vowel-based spelling variation, linguistic hypercorrection and its effect on spelling variation, unsupervised Part-of-Speech (POS) tagging and base phrase chunking for DA. Although the algorithms for both POS tagging and base-phrase chunking are still under development, the results are promising.

2010

pdf bib
Summarizing Contrastive Viewpoints in Opinionated Text
Michael Paul | ChengXiang Zhai | Roxana Girju
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Mining the Web for the Induction of a Dialectical Arabic Lexicon
Rania Al-Sabbagh | Roxana Girju
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes the first phase of building a lexicon of Egyptian Cairene Arabic (ECA) ― one of the most widely understood dialects in the Arab World ― and Modern Standard Arabic (MSA). Each ECA entry is mapped to its MSA synonym, Part-of-Speech (POS) tag and top-ranked contexts based on Web queries; and thus each entry is provided with basic syntactic and semantic information for a generic lexicon compatible with multiple NLP applications. Moreover, through their MSA synonyms, ECA entries acquire access to MSA available NLP tools and resources which are considerably available. Using an associationist approach based on the correlations between word co-occurrence patterns in both dialects, we change the direction of the acquisition process from parallel to circular to overcome a bottleneck of current research on Arabic dialects, namely the lack of parallel corpora, and to alleviate accuracy rates for using unrelated Web documents which are more frequently available. Manually evaluated for 1,000 word entries by two native speakers of the ECA-MSA varieties, the proposed approach achieves a promising F-measured performance rate of 70.9%. In discussion to the proposed algorithm, different semantic issues are highlighted for upcoming phases of the induction of a more comprehensive ECA-MSA lexicon.

2009

pdf bib
The Syntax and Semantics of Prepositions in the Task of Automatic Interpretation of Nominal Phrases and Compounds: A Cross-Linguistic Study
Roxana Girju
Computational Linguistics, Volume 35, Number 2, June 2009 - Special Issue on Prepositions

pdf bib
Topic Modeling of Research Fields: An Interdisciplinary Perspective
Michael Paul | Roxana Girju
Proceedings of the International Conference RANLP-2009

pdf bib
Identifying Semantic Relations in Context: Near-misses and Overlaps
Alla Rozovskaya | Roxana Girju
Proceedings of the International Conference RANLP-2009

pdf bib
Cross-Cultural Analysis of Blogs and Forums with Mixed-Collection Topic Models
Michael Paul | Roxana Girju
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Mining the Web for Reciprocal Relationships
Michael Paul | Roxana Girju | Chen Li
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
Investigating Automatic Alignment Methods for Slide Generation from Academic Papers
Brandon Beamer | Roxana Girju
Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)

pdf bib
Panel Summary: Educating and Assessing the Human Translator in an Age of Technology
Patricia Phillips-Batoma | Roxana Girju | Elizabeth Lowe | Patricia Minacori
Proceedings of Machine Translation Summit XII: Plenaries

2008

pdf bib
Book Review: Mathematical Linguistics by András Kornai
Richard Sproat | Roxana Gîrju
Computational Linguistics, Volume 34, Number 4, December 2008

2007

pdf bib
SemEval-2007 Task 04: Classification of Semantic Relations between Nominals
Roxana Girju | Preslav Nakov | Vivi Nastase | Stan Szpakowicz | Peter Turney | Deniz Yuret
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
UIUC: A Knowledge-rich Approach to Identifying Semantic Relations between Nominals
Brandon Beamer | Suma Bhat | Brant Chee | Andrew Fister | Alla Rozovskaya | Roxana Girju
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

pdf bib
Improving the Interpretation of Noun Phrases with Cross-linguistic Information
Roxana Girju
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Experiments with an Annotation Scheme for a Knowledge-rich Noun Phrase Interpretation System
Roxana Girju
Proceedings of the Linguistic Annotation Workshop

2006

pdf bib
Automatic Discovery of Part-Whole Relations
Roxana Girju | Adriana Badulescu | Dan Moldovan
Computational Linguistics, Volume 32, Number 1, March 2006

2004

pdf bib
SVM classification of FrameNet semantic roles
Dan Moldovan | Roxana Gîrju | Marian Olteanu | Ovidiu Fortu
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

pdf bib
Models for the Semantic Classification of Noun Phrases
Dan Moldovan | Adriana Badulescu | Marta Tatu | Daniel Antohe | Roxana Girju
Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004

pdf bib
Support Vector Machines Applied to the Classification of Semantic Relations in Nominalized Noun Phrases
Roxana Girju | Ana-Maria Giuglea | Marian Olteanu | Ovidiu Fortu | Orest Bolohan | Dan Moldovan
Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL 2004

2003

pdf bib
Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations
Roxana Girju | Adriana Badulescu | Dan Moldovan
Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Discovery of Manner Relations and Their Applicability to Question Answering
Roxana Girju | Manju Putcha | Dan Moldovan
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

pdf bib
Automatic Detection of Causal Relations for Question Answering
Roxana Girju
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

2001

pdf bib
The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering
Sanda Harabagiu | Dan Moldovan | Marius Pasca | Rada Mihalcea | Mihai Surdeanu | Razvan Bunsecu | Roxana Girju | Vasile Rus | Paul Morarescu
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib
Domain-Specific Knowledge Acquisition from Text
Dan Moldovan | Roxana Girju | Vasile Rus
Sixth Applied Natural Language Processing Conference

pdf bib
The Structure and Performance of an Open-Domain Question Answering System
Dan Moldovan | Sanda Harabagiu | Marius Pasca | Rada Mihalcea | Roxana Girju | Richard Goodrum | Vasile Rus
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics