2024
pdf
bib
abs
Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection
Gaetan Lopez Latouche
|
Marc-André Carbonneau
|
Benjamin Swanson
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Grammatical Error Detection (GED) methods rely heavily on human annotated error corpora. However, these annotations are unavailable in many low-resource languages. In this paper, we investigate GED in this context. Leveraging the zero-shot cross-lingual transfer capabilities of multilingual pre-trained language models, we train a model using data from a diverse set of languages to generate synthetic errors in other languages. These synthetic error corpora are then used to train a GED model. Specifically we propose a two-stage fine-tuning pipeline where the GED model is first fine-tuned on multilingual synthetic data from target languages followed by fine-tuning on human-annotated GED corpora from source languages. This approach outperforms current state-of-the-art annotation-free GED methods. We also analyse the errors produced by our method and other strong baselines, finding that our approach produces errors that are more diverse and more similar to human errors.
pdf
bib
abs
BinaryAlign: Word Alignment as Binary Sequence Labeling
Gaetan Latouche
|
Marc-André Carbonneau
|
Benjamin Swanson
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Real world deployments of word alignment are almost certain to cover both high and low resource languages. However, the state-of-the-art for this task recommends a different model class depending on the availability of gold alignment training data for a particular language pair. We propose BinaryAlign, a novel word alignment technique based on binary sequence labeling that outperforms existing approaches in both scenarios, offering a unifying approach to the task. Additionally, we vary the specific choice of multilingual foundation model, perform stratified error analysis over alignment error type, and explore the performance of BinaryAlign on non-English language pairs. We make our source code publicly available.
2023
pdf
bib
abs
Generating Video Game Scripts with Style
Gaetan Lopez Latouche
|
Laurence Marcotte
|
Ben Swanson
Proceedings of the 5th Workshop on NLP for Conversational AI (NLP4ConvAI 2023)
While modern language models can generate a scripted scene in the format of a play, movie, or video game cutscene the quality of machine generated text remains behind that of human authors. In this work, we focus on one aspect of this quality gap; generating text in the style of an arbitrary and unseen character. We propose the Style Adaptive Semiparametric Scriptwriter (SASS) which leverages an adaptive weighted style memory to generate dialog lines in accordance with a character’s speaking patterns. Using the LIGHT dataset as well as a new corpus of scripts from twenty-three AAA video games, we show that SASS not only outperforms similar models but in some cases can also be used in conjunction with them to yield further improvement.
2021
pdf
bib
abs
Story Centaur: Large Language Model Few Shot Learning as a Creative Writing Tool
Ben Swanson
|
Kory Mathewson
|
Ben Pietrzak
|
Sherol Chen
|
Monica Dinalescu
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations
Few shot learning with large language models has the potential to give individuals without formal machine learning training the access to a wide range of text to text models. We consider how this applies to creative writers and present Story Centaur, a user interface for prototyping few shot models and a set of recombinable web components that deploy them. Story Centaur’s goal is to expose creative writers to few shot learning with a simple but powerful interface that lets them compose their own co-creation tools that further their own unique artistic directions. We build out several examples of such tools, and in the process probe the boundaries and issues surrounding generation with large language models.
2020
pdf
bib
abs
Usnea: An Authorship Tool for Interactive Fiction using Retrieval Based Semantic Parsing
Ben Swanson
|
Boris Smus
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
The reader of a choose your own adventure novel and the user of a modern virtual assistant have a subtle similarity; both may, through the right lens, be viewed as engaging with a work of Interactive Fiction. This literary form emerged in the 1970s and has grown like a vine along the branch of modern technology, one guided by the advances of the other. In this work we weave together threads from the Interactive Fiction community and neural semantic parsing for dialog systems, defining the data model and necessary algorithms for a novel type of Interactive Fiction and open sourcing its accompanying authoring tool. Specifically, our work integrates retrieval based semantic parsing predicates into the branching story structures well known to the Interactive Fiction community, relaxing the relatively strict lexical options of preexisting systems.
2014
pdf
bib
Natural Language Generation with Vocabulary Constraints
Ben Swanson
|
Elif Yamangil
|
Eugene Charniak
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications
pdf
bib
Data Driven Language Transfer Hypotheses
Ben Swanson
|
Eugene Charniak
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers
2013
pdf
bib
A Context Free TAG Variant
Ben Swanson
|
Elif Yamangil
|
Eugene Charniak
|
Stuart Shieber
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
Extracting the Native Language Signal for Second Language Acquisition
Ben Swanson
|
Eugene Charniak
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
pdf
bib
Exploring Syntactic Representations for Native Language Identification
Ben Swanson
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications
2012
pdf
bib
Correction Detection and Error Type Selection as an ESL Educational Aid
Ben Swanson
|
Elif Yamangil
Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
pdf
bib
Native Language Detection with Tree Substitution Grammars
Benjamin Swanson
|
Eugene Charniak
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)