Christopher Bryant


2021

pdf bib
Document-level grammatical error correction
Zheng Yuan | Christopher Bryant
Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications

Document-level context can provide valuable information in grammatical error correction (GEC), which is crucial for correcting certain errors and resolving inconsistencies. In this paper, we investigate context-aware approaches and propose document-level GEC systems. Additionally, we employ a three-step training strategy to benefit from both sentence-level and document-level data. Our system outperforms previous document-level and all other NMT-based single-model systems, achieving state of the art on a common test set.

2020

pdf bib
A Crash Course in Automatic Grammatical Error Correction
Roman Grundkiewicz | Christopher Bryant | Mariano Felice
Proceedings of the 28th International Conference on Computational Linguistics: Tutorial Abstracts

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting all types of errors in written text. Although most research has focused on correcting errors in the context of English as a Second Language (ESL), GEC can also be applied to other languages and native text. The main application of a GEC system is thus to assist humans with their writing. Academic and commercial interest in GEC has grown significantly since the Helping Our Own (HOO) and Conference on Natural Language Learning (CoNLL) shared tasks in 2011-14, and a record-breaking 24 teams took part in the recent Building Educational Applications (BEA) shared task. Given this interest, and the recent shift towards neural approaches, we believe the time is right to offer a tutorial on GEC for researchers who may be new to the field or who are interested in the current state of the art and future challenges. With this in mind, the main goal of this tutorial is not only to bring attendees up to speed with GEC in general, but also examine the development of neural-based GEC systems.

pdf bib
CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech Corpus
Li Nguyen | Christopher Bryant
Proceedings of the 12th Language Resources and Evaluation Conference

This paper introduces the Canberra Vietnamese-English Code-switching corpus (CanVEC), an original corpus of natural mixed speech that we semi-automatically annotated with language information, part of speech (POS) tags and Vietnamese translations. The corpus, which was built to inform a sociolinguistic study on language variation and code-switching, consists of 10 hours of recorded speech (87k tokens) between 45 Vietnamese-English bilinguals living in Canberra, Australia. We describe how we collected and annotated the corpus by pipelining several monolingual toolkits to considerably speed up the annotation process. We also describe how we evaluated the automatic annotations to ensure corpus reliability. We make the corpus available for research purposes.

2019

pdf bib
Neural Grammatical Error Correction with Finite State Transducers
Felix Stahlberg | Christopher Bryant | Bill Byrne
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Grammatical error correction (GEC) is one of the areas in natural language processing in which purely neural models have not yet superseded more traditional symbolic models. Hybrid systems combining phrase-based statistical machine translation (SMT) and neural sequence models are currently among the most effective approaches to GEC. However, both SMT and neural sequence-to-sequence models require large amounts of annotated data. Language model based GEC (LM-GEC) is a promising alternative which does not rely on annotated training data. We show how to improve LM-GEC by applying modelling techniques based on finite state transducers. We report further gains by rescoring with neural language models. We show that our methods developed for LM-GEC can also be used with SMT systems if annotated training data is available. Our best system outperforms the best published result on the CoNLL-2014 test set, and achieves far better relative improvements over the SMT baselines than previous hybrid systems.

pdf bib
The BEA-2019 Shared Task on Grammatical Error Correction
Christopher Bryant | Mariano Felice | Øistein E. Andersen | Ted Briscoe
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

This paper reports on the BEA-2019 Shared Task on Grammatical Error Correction (GEC). As with the CoNLL-2014 shared task, participants are required to correct all types of errors in test data. One of the main contributions of the BEA-2019 shared task is the introduction of a new dataset, the Write&Improve+LOCNESS corpus, which represents a wider range of native and learner English levels and abilities. Another contribution is the introduction of tracks, which control the amount of annotated data available to participants. Systems are evaluated in terms of ERRANT F_0.5, which allows us to report a much wider range of performance statistics. The competition was hosted on Codalab and remains open for further submissions on the blind test set.

2018

pdf bib
Language Model Based Grammatical Error Correction without Annotated Training Data
Christopher Bryant | Ted Briscoe
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

Since the end of the CoNLL-2014 shared task on grammatical error correction (GEC), research into language model (LM) based approaches to GEC has largely stagnated. In this paper, we re-examine LMs in GEC and show that it is entirely possible to build a simple system that not only requires minimal annotated data (∼1000 sentences), but is also fairly competitive with several state-of-the-art systems. This approach should be of particular interest for languages where very little annotated training data exists, although we also hope to use it as a baseline to motivate future research.

2017

pdf bib
Automatic Annotation and Evaluation of Error Types for Grammatical Error Correction
Christopher Bryant | Mariano Felice | Ted Briscoe
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Until now, error type performance for Grammatical Error Correction (GEC) systems could only be measured in terms of recall because system output is not annotated. To overcome this problem, we introduce ERRANT, a grammatical ERRor ANnotation Toolkit designed to automatically extract edits from parallel original and corrected sentences and classify them according to a new, dataset-agnostic, rule-based framework. This not only facilitates error type evaluation at different levels of granularity, but can also be used to reduce annotator workload and standardise existing GEC datasets. Human experts rated the automatic edits as “Good” or “Acceptable” in at least 95% of cases, so we applied ERRANT to the system output of the CoNLL-2014 shared task to carry out a detailed error type analysis for the first time.

2016

pdf bib
Automatic Extraction of Learner Errors in ESL Sentences Using Linguistically Enhanced Alignments
Mariano Felice | Christopher Bryant | Ted Briscoe
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a new method of automatically extracting learner errors from parallel English as a Second Language (ESL) sentences in an effort to regularise annotation formats and reduce inconsistencies. Specifically, given an original and corrected sentence, our method first uses a linguistically enhanced alignment algorithm to determine the most likely mappings between tokens, and secondly employs a rule-based function to decide which alignments should be merged. Our method beats all previous approaches on the tested datasets, achieving state-of-the-art results for automatic error extraction.

2015

pdf bib
How Far are We from Fully Automatic High Quality Grammatical Error Correction?
Christopher Bryant | Hwee Tou Ng
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

pdf bib
The CoNLL-2015 Shared Task on Shallow Discourse Parsing
Nianwen Xue | Hwee Tou Ng | Sameer Pradhan | Rashmi Prasad | Christopher Bryant | Attapol Rutherford
Proceedings of the Nineteenth Conference on Computational Natural Language Learning - Shared Task

2014

pdf bib
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task
Hwee Tou Ng | Siew Mei Wu | Ted Briscoe | Christian Hadiwinoto | Raymond Hendy Susanto | Christopher Bryant
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task

pdf bib
The CoNLL-2014 Shared Task on Grammatical Error Correction
Hwee Tou Ng | Siew Mei Wu | Ted Briscoe | Christian Hadiwinoto | Raymond Hendy Susanto | Christopher Bryant
Proceedings of the Eighteenth Conference on Computational Natural Language Learning: Shared Task