Adriane Boyd


2018

pdf bib
Using Wikipedia Edits in Low Resource Grammatical Error Correction
Adriane Boyd
Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text

We develop a grammatical error correction (GEC) system for German using a small gold GEC corpus augmented with edits extracted from Wikipedia revision history. We extend the automatic error annotation tool ERRANT (Bryant et al., 2017) for German and use it to analyze both gold GEC corrections and Wikipedia edits (Grundkiewicz and Junczys-Dowmunt, 2014) in order to select as additional training data Wikipedia edits containing grammatical corrections similar to those in the gold corpus. Using a multilayer convolutional encoder-decoder neural network GEC approach (Chollampatt and Ng, 2018), we evaluate the contribution of Wikipedia edits and find that carefully selected Wikipedia edits increase performance by over 5%.

pdf bib
Normalization in Context: Inter-Annotator Agreement for Meaning-Based Target Hypothesis Annotation
Adriane Boyd
Proceedings of the 7th workshop on NLP for Computer Assisted Language Learning

2014

pdf bib
The MERLIN corpus: Learner language and the CEFR
Adriane Boyd | Jirka Hana | Lionel Nicolas | Detmar Meurers | Katrin Wisniewski | Andrea Abel | Karin Schöne | Barbora Štindlová | Chiara Vettori
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The MERLIN corpus is a written learner corpus for Czech, German,and Italian that has been designed to illustrate the Common European Framework of Reference for Languages (CEFR) with authentic learner data. The corpus contains 2,290 learner texts produced in standardized language certifications covering CEFR levels A1-C1. The MERLIN annotation scheme includes a wide range of language characteristics that enable research into the empirical foundations of the CEFR scales and provide language teachers, test developers, and Second Language Acquisition researchers with concrete examples of learner performance and progress across multiple proficiency levels. For computational linguistics, it provide a range of authentic learner data for three target languages, supporting a broadening of the scope of research in areas such as automatic proficiency classification or native language identification. The annotated corpus and related information will be freely available as a corpus resource and through a freely accessible, didactically-oriented online platform.

2012

pdf bib
Informing Determiner and Preposition Error Correction with Hierarchical Word Clustering
Adriane Boyd | Marion Zepf | Detmar Meurers
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

2011

pdf bib
Data-Driven Correction of FunctionWords in Non-Native English
Adriane Boyd | Detmar Meurers
Proceedings of the 13th European Workshop on Natural Language Generation

2010

pdf bib
Proceedings of the NAACL HLT 2010 Student Research Workshop
Julia Hockenmaier | Diane Litman | Adriane Boyd | Mahesh Joshi | Frank Rudzicz
Proceedings of the NAACL HLT 2010 Student Research Workshop

pdf bib
Enhancing Authentic Web Pages for Language Learners
Detmar Meurers | Ramon Ziai | Luiz Amaral | Adriane Boyd | Aleksandar Dimitrov | Vanessa Metcalf | Niels Ott
Proceedings of the NAACL HLT 2010 Fifth Workshop on Innovative Use of NLP for Building Educational Applications

pdf bib
EAGLE: an Error-Annotated Corpus of Beginning Learner German
Adriane Boyd
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes the Error-Annotated German Learner Corpus (EAGLE), a corpus of beginning learner German with grammatical error annotation. The corpus contains online workbook and and hand-written essay data from learners in introductory German courses at The Ohio State University. We introduce an error typology developed for beginning learners of German that focuses on linguistic properties of lexical items present in the learner data and present the detailed error typologies for selection, agreement, and word order errors. The corpus uses an error annotation format that extends the multi-layer standoff format proposed by Luedeling et al. (2005) to include incremental target hypotheses for each error. In this format, each annotated error includes information about the location of tokens affected by the error, the error type, and the proposed target correction. The multi-layer standoff format allows us to annotate ambiguous errors with more than one possible target correction and to annotate the multiple, overlapping errors common in beginning learner productions.

2009

pdf bib
Pronunciation Modeling in Spelling Correction for Writers of English as a Foreign Language
Adriane Boyd
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

2008

pdf bib
Revisiting the Impact of Different Annotation Schemes on PCFG Parsing: A Grammatical Dependency Evaluation
Adriane Boyd | Detmar Meurers
Proceedings of the Workshop on Parsing German

2007

pdf bib
Discontinuity Revisited: An Improved Conversion to Context-Free Representations
Adriane Boyd
Proceedings of the Linguistic Annotation Workshop

2005

pdf bib
Identifying Non-Referential it: A Machine Learning Approach Incorporating Linguistically Motivated Patterns
Adriane Boyd | Whitney Gegg-Harrison | Donna Byron
Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing