Aoife Cahill


2024

pdf bib
HumVI: A Multilingual Dataset for Detecting Violent Incidents Impacting Humanitarian Aid
Hemank Lamba | Anton Abilov | Ke Zhang | Elizabeth M Olson | Henry Kudzanai Dambanemuya | João Cordovil Bárcia | David S. Batista | Christina Wille | Aoife Cahill | Joel R. Tetreault | Alejandro Jaimes
Findings of the Association for Computational Linguistics: EMNLP 2024

Humanitarian organizations can enhance their effectiveness by analyzing data to discover trends, gather aggregated insights, manage their security risks, support decision-making, and inform advocacy and funding proposals. However, data about violent incidents with direct impact and relevance for humanitarian aid operations is not readily available. An automatic data collection and NLP-backed classification framework aligned with humanitarian perspectives can help bridge this gap. In this paper, we present HumVI – a dataset comprising news articles in three languages (English, French, Arabic) containing instances of different types of violent incidents categorized by the humanitarian sector they impact, e.g., aid security, education, food security, health, and protection. Reliable labels were obtained for the dataset by partnering with a data-backed humanitarian organization, Insecurity Insight. We provide multiple benchmarks for the dataset, employing various deep learning architectures and techniques, including data augmentation and mask loss, to address different task-related challenges, e.g., domain expansion. The dataset is publicly available at https://github.com/dataminr-ai/humvi-dataset.

2023

pdf bib
A New Task and Dataset on Detecting Attacks on Human Rights Defenders
Shihao Ran | Di Lu | Aoife Cahill | Joel Tetreault | Alejandro Jaimes
Findings of the Association for Computational Linguistics: ACL 2023

The ability to conduct retrospective analyses of attacks on human rights defenders over time and by location is important for humanitarian organizations to better understand historical or ongoing human rights violations and thus better manage the global impact of such events. We hypothesize that NLP can support such efforts by quickly processing large collections of news articles to detect and summarize the characteristics of attacks on human rights defenders. To that end, we propose a new dataset for detecting Attacks on Human Rights Defenders (HRDsAttack) consisting of crowdsourced annotations on 500 online news articles. The annotations include fine-grained information about the type and location of the attacks, as well as information about the victim(s). We demonstrate the usefulness of the dataset by using it to train and evaluate baseline models on several sub-tasks to predict the annotated characteristics.

2021

pdf bib
Supporting Spanish Writers using Automated Feedback
Aoife Cahill | James Bruno | James Ramey | Gilmar Ayala Meneses | Ian Blood | Florencia Tolentino | Tamar Lavee | Slava Andreyev
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations

We present a tool that provides automated feedback to students studying Spanish writing. The feedback is given for four categories: topic development, coherence, writing conventions, and essay organization. The tool is made freely available via a Google Docs add-on. A small user study with third-level students in Mexico shows that students found the tool generally helpful and that most of them plan to continue using it as they work to improve their writing skills.

2020

pdf bib
Using PRMSE to evaluate automated scoring systems in the presence of label noise
Anastassia Loukina | Nitin Madnani | Aoife Cahill | Lili Yao | Matthew S. Johnson | Brian Riordan | Daniel F. McCaffrey
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

The effect of noisy labels on the performance of NLP systems has been studied extensively for system training. In this paper, we focus on the effect that noisy labels have on system evaluation. Using automated scoring as an example, we demonstrate that the quality of human ratings used for system evaluation have a substantial impact on traditional performance metrics, making it impossible to compare system evaluations on labels with different quality. We propose that a new metric, PRMSE, developed within the educational measurement community, can help address this issue, and provide practical guidelines on using PRMSE.

pdf bib
Context-based Automated Scoring of Complex Mathematical Responses
Aoife Cahill | James H Fife | Brian Riordan | Avijit Vajpayee | Dmytro Galochkin
Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications

The tasks of automatically scoring either textual or algebraic responses to mathematical questions have both been well-studied, albeit separately. In this paper we propose a method for automatically scoring responses that contain both text and algebraic expressions. Our method not only achieves high agreement with human raters, but also links explicitly to the scoring rubric – essentially providing explainable models and a way to potentially provide feedback to students in the future.

pdf bib
Don’t take “nswvtnvakgxpm” for an answer –The surprising vulnerability of automatic content scoring systems to adversarial input
Yuning Ding | Brian Riordan | Andrea Horbach | Aoife Cahill | Torsten Zesch
Proceedings of the 28th International Conference on Computational Linguistics

Automatic content scoring systems are widely used on short answer tasks to save human effort. However, the use of these systems can invite cheating strategies, such as students writing irrelevant answers in the hopes of gaining at least partial credit. We generate adversarial answers for benchmark content scoring datasets based on different methods of increasing sophistication and show that even simple methods lead to a surprising decrease in content scoring performance. As an extreme example, up to 60% of adversarial answers generated from random shuffling of words in real answers are accepted by a state-of-the-art scoring system. In addition to analyzing the vulnerabilities of content scoring systems, we examine countermeasures such as adversarial training and show that these measures improve system robustness against adversarial answers considerably but do not suffice to completely solve the problem.

2018

pdf bib
Automated Scoring: Beyond Natural Language Processing
Nitin Madnani | Aoife Cahill
Proceedings of the 27th International Conference on Computational Linguistics

In this position paper, we argue that building operational automated scoring systems is a task that has disciplinary complexity above and beyond standard competitive shared tasks which usually involve applying the latest machine learning techniques to publicly available data in order to obtain the best accuracy. Automated scoring systems warrant significant cross-discipline collaboration of which natural language processing and machine learning are just two of many important components. Such systems have multiple stakeholders with different but valid perspectives that can often times be at odds with each other. Our position is that it is essential for us as NLP researchers to understand and incorporate these perspectives in our research and work towards a mutually satisfactory solution in order to build automated scoring systems that are accurate, fair, unbiased, and useful.

pdf bib
Atypical Inputs in Educational Applications
Su-Youn Yoon | Aoife Cahill | Anastassia Loukina | Klaus Zechner | Brian Riordan | Nitin Madnani
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 3 (Industry Papers)

In large-scale educational assessments, the use of automated scoring has recently become quite common. While the majority of student responses can be processed and scored without difficulty, there are a small number of responses that have atypical characteristics that make it difficult for an automated scoring system to assign a correct score. We describe a pipeline that detects and processes these kinds of responses at run-time. We present the most frequent kinds of what are called non-scorable responses along with effective filtering models based on various NLP and speech processing technologies. We give an overview of two operational automated scoring systems —one for essay scoring and one for speech scoring— and describe the filtering models they use. Finally, we present an evaluation and analysis of filtering models used for spoken responses in an assessment of language proficiency.

2017

pdf bib
Building Better Open-Source Tools to Support Fairness in Automated Scoring
Nitin Madnani | Anastassia Loukina | Alina von Davier | Jill Burstein | Aoife Cahill
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing

Automated scoring of written and spoken responses is an NLP application that can significantly impact lives especially when deployed as part of high-stakes tests such as the GRE® and the TOEFL®. Ethical considerations require that automated scoring algorithms treat all test-takers fairly. The educational measurement community has done significant research on fairness in assessments and automated scoring systems must incorporate their recommendations. The best way to do that is by making available automated, non-proprietary tools to NLP researchers that directly incorporate these recommendations and generate the analyses needed to help identify and resolve biases in their scoring systems. In this paper, we attempt to provide such a solution.

pdf bib
Speech- and Text-driven Features for Automated Scoring of English Speaking Tasks
Anastassia Loukina | Nitin Madnani | Aoife Cahill
Proceedings of the Workshop on Speech-Centric Natural Language Processing

We consider the automatic scoring of a task for which both the content of the response as well its spoken fluency are important. We combine features from a text-only content scoring system originally designed for written responses with several categories of acoustic features. Although adding any single category of acoustic features to the text-only system on its own does not significantly improve performance, adding all acoustic features together does yield a small but significant improvement. These results are consistent for responses to open-ended questions and to questions focused on some given source material.

pdf bib
A Report on the 2017 Native Language Identification Shared Task
Shervin Malmasi | Keelan Evanini | Aoife Cahill | Joel Tetreault | Robert Pugh | Christopher Hamill | Diane Napolitano | Yao Qian
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Native Language Identification (NLI) is the task of automatically identifying the native language (L1) of an individual based on their language production in a learned language. It is typically framed as a classification task where the set of L1s is known a priori. Two previous shared tasks on NLI have been organized where the aim was to identify the L1 of learners of English based on essays (2013) and spoken responses (2016) they provided during a standardized assessment of academic English proficiency. The 2017 shared task combines the inputs from the two prior tasks for the first time. There are three tracks: NLI on the essay only, NLI on the spoken response only (based on a transcription of the response and i-vector acoustic features), and NLI using both responses. We believe this makes for a more interesting shared task while building on the methods and results from the previous two shared tasks. In this paper, we report the results of the shared task. A total of 19 teams competed across the three different sub-tasks. The fusion track showed that combining the written and spoken responses provides a large boost in prediction accuracy. Multiple classifier systems (e.g. ensembles and meta-classifiers) were the most effective in all tasks, with most based on traditional classifiers (e.g. SVMs) with lexical/syntactic features.

pdf bib
Investigating neural architectures for short answer scoring
Brian Riordan | Andrea Horbach | Aoife Cahill | Torsten Zesch | Chong Min Lee
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

Neural approaches to automated essay scoring have recently shown state-of-the-art performance. The automated essay scoring task typically involves a broad notion of writing quality that encompasses content, grammar, organization, and conventions. This differs from the short answer content scoring task, which focuses on content accuracy. The inputs to neural essay scoring models – ngrams and embeddings – are arguably well-suited to evaluate content in short answer scoring tasks. We investigate how several basic neural approaches similar to those used for automated essay scoring perform on short answer scoring. We show that neural architectures can outperform a strong non-neural baseline, but performance and optimal parameter settings vary across the more diverse types of prompts typical of short answer scoring.

pdf bib
A Large Scale Quantitative Exploration of Modeling Strategies for Content Scoring
Nitin Madnani | Anastassia Loukina | Aoife Cahill
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications

We explore various supervised learning strategies for automated scoring of content knowledge for a large corpus of 130 different content-based questions spanning four subject areas (Science, Math, English Language Arts, and Social Studies) and containing over 230,000 responses scored by human raters. Based on our analyses, we provide specific recommendations for content scoring. These are based on patterns observed across multiple questions and assessments and are, therefore, likely to generalize to other scenarios and prove useful to the community as automated content scoring becomes more popular in schools and classrooms.

2016

pdf bib
String Kernels for Native Language Identification: Insights from Behind the Curtains
Radu Tudor Ionescu | Marius Popescu | Aoife Cahill
Computational Linguistics, Volume 42, Issue 3 - September 2016

pdf bib
The Effect of Multiple Grammatical Errors on Processing Non-Native Writing
Courtney Napoles | Aoife Cahill | Nitin Madnani
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Automated scoring across different modalities
Anastassia Loukina | Aoife Cahill
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Model Combination for Correcting Preposition Selection Errors
Nitin Madnani | Michael Heilman | Aoife Cahill
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Automatically Scoring Tests of Proficiency in Music Instruction
Nitin Madnani | Aoife Cahill | Brian Riordan
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

2015

pdf bib
Measuring Feature Diversity in Native Language Identification
Shervin Malmasi | Aoife Cahill
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Reducing Annotation Efforts in Supervised Short Answer Scoring
Torsten Zesch | Michael Heilman | Aoife Cahill
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Preliminary Experiments on Crowdsourced Evaluation of Feedback Granularity
Nitin Madnani | Martin Chodorow | Aoife Cahill | Melissa Lopez | Yoko Futagi | Yigal Attali
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Parsing Learner Text: to Shoehorn or not to Shoehorn
Aoife Cahill
Proceedings of the 9th Linguistic Annotation Workshop

2014

pdf bib
Can characters reveal your native language? A language-independent approach to native language identification
Radu Tudor Ionescu | Marius Popescu | Aoife Cahill
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf bib
How to Produce Unseen Teddy Bears: Improved Morphological Processing of Compounds in SMT
Fabienne Cap | Alexander Fraser | Marion Weller | Aoife Cahill
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Predicting Grammaticality on an Ordinal Scale
Michael Heilman | Aoife Cahill | Nitin Madnani | Melissa Lopez | Matthew Mulholland | Joel Tetreault
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
An Explicit Feedback System for Preposition Errors based on Wikipedia Revisions
Nitin Madnani | Aoife Cahill
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Proceedings of the 8th International Natural Language Generation Conference (INLG)
Margaret Mitchell | Kathleen McCoy | David McDonald | Aoife Cahill
Proceedings of the 8th International Natural Language Generation Conference (INLG)

pdf bib
Proceedings of the INLG and SIGDIAL 2014 Joint Session
Margaret Mitchell | Kathleen McCoy | David McDonald | Aoife Cahill
Proceedings of the INLG and SIGDIAL 2014 Joint Session

pdf bib
Self-Training for Parsing Learner Text
Aoife Cahill | Binod Gyawali | James Bruno
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

2013

pdf bib
Robust Systems for Preposition Error Correction Using Wikipedia Revisions
Aoife Cahill | Nitin Madnani | Joel Tetreault | Diane Napolitano
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
ParaQuery: Making Sense of Paraphrase Collections
Lili Kotlerman | Nitin Madnani | Aoife Cahill
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

pdf bib
A Report on the First Native Language Identification Shared Task
Joel Tetreault | Daniel Blanchard | Aoife Cahill
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
Detecting Missing Hyphens in Learner Text
Aoife Cahill | Martin Chodorow | Susanne Wolff | Nitin Madnani
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

2012

pdf bib
Native Tongues, Lost and Found: Resources and Empirical Evaluations in Native Language Identification
Joel Tetreault | Daniel Blanchard | Aoife Cahill | Martin Chodorow
Proceedings of COLING 2012

pdf bib
Modeling Inflection and Word-Formation in SMT
Alexander Fraser | Marion Weller | Aoife Cahill | Fabienne Cap
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
To what extent does sentence-internal realisation reflect discourse context? A study on word order
Sina Zarrieß | Aoife Cahill | Jonas Kuhn
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Automatically Acquiring Fine-Grained Information Status Distinctions in German
Aoife Cahill | Arndt Riester
Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Precision Isn’t Everything: A Hybrid Approach to Grammatical Error Detection
Michael Heilman | Aoife Cahill | Joel Tetreault
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

2011

pdf bib
Underspecifying and Predicting Voice for Surface Realisation Ranking
Sina Zarrieß | Aoife Cahill | Jonas Kuhn
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf bib
Cross-Lingual Induction for Deep Broad-Coverage Syntax: A Case Study on German Participles
Sina Zarrieß | Aoife Cahill | Jonas Kuhn | Christian Rohrer
Coling 2010: Posters

pdf bib
A Cross-Lingual Induction Technique for German Adverbial Participles
Sina Zarrieß | Aoife Cahill | Jonas Kuhn | Christian Rohrer
Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground

2009

pdf bib
Human Evaluation of a German Surface Realisation Ranker
Aoife Cahill | Martin Forst
Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009)

pdf bib
Incorporating Information Status into Generation Ranking
Aoife Cahill | Arndt Riester
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Correlating Human and Automatic Evaluation of a German Surface Realiser
Aoife Cahill
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
Wide-Coverage Deep Statistical Parsing Using Automatic Dependency Structure Annotation
Aoife Cahill | Michael Burke | Ruth O’Donovan | Stefan Riezler | Josef van Genabith | Andy Way
Computational Linguistics, Volume 34, Number 1, March 2008

pdf bib
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation
Johan Bos | Edward Briscoe | Aoife Cahill | John Carroll | Stephen Clark | Ann Copestake | Dan Flickinger | Josef van Genabith | Julia Hockenmaier | Aravind Joshi | Ronald Kaplan | Tracy Holloway King | Sandra Kuebler | Dekang Lin | Jan Tore Lønning | Christopher Manning | Yusuke Miyao | Joakim Nivre | Stephan Oepen | Kenji Sagae | Nianwen Xue | Yi Zhang
Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation

pdf bib
Speeding up LFG Parsing Using C-Structure Pruning
Aoife Cahill | John T. Maxwell III | Paul Meurer | Christian Rohrer | Victoria Rosén
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

2007

pdf bib
Exploiting Multi-Word Units in History-Based Probabilistic Generation
Deirdre Hogan | Conor Cafferkey | Aoife Cahill | Josef van Genabith
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Pruning the Search Space of a Hand-Crafted Parsing System with a Probabilistic Parser
Aoife Cahill | Tracy Holloway King | John T. Maxwell III
ACL 2007 Workshop on Deep Linguistic Processing

pdf bib
Stochastic Realisation Ranking for a Free Word Order Language
Aoife Cahill | Martin Forst | Christian Rohrer
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

2006

pdf bib
QuestionBank: Creating a Corpus of Parse-Annotated Questions
John Judge | Aoife Cahill | Josef van Genabith
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Robust PCFG-Based Generation Using Automatically Acquired LFG Approximations
Aoife Cahill | Josef van Genabith
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II and Penn-III Treebanks
Ruth O’Donovan | Michael Burke | Aoife Cahill | Josef van Genabith | Andy Way
Computational Linguistics, Volume 31, Number 3, September 2005

2004

pdf bib
Long-Distance Dependency Resolution in Automatically Acquired Wide-Coverage PCFG-Based LFG Approximations
Aoife Cahill | Michael Burke | Ruth O’Donovan | Josef van Genabith | Andy Way
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Large-Scale Induction and Evaluation of Lexical Resources from the Penn-II Treebank
Ruth O’Donovan | Michael Burke | Aoife Cahill | Josef van Genabith | Andy Way
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf bib
Treebank-Based Acquisition of a Chinese Lexical-Functional Grammar
Michael Burke | Olivia Lam | Aoife Cahill | Rowena Chan | Ruth O’Donovan | Adams Bodomo | Josef van Genabith | Andy Way
Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation

2002

pdf bib
TTS - A Treebank Tool Suite
Aoife Cahill | Josef van Genabith
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

Search
Co-authors
Fix data