Chris Brust
2020
Simultaneous Translation and Paraphrase for Language Education
Stephen Mayhew
|
Klinton Bicknell
|
Chris Brust
|
Bill McDowell
|
Will Monroe
|
Burr Settles
Proceedings of the Fourth Workshop on Neural Generation and Translation
We present the task of Simultaneous Translation and Paraphrasing for Language Education (STAPLE). Given a prompt in one language, the goal is to generate a diverse set of correct translations that language learners are likely to produce. This is motivated by the need to create and maintain large, high-quality sets of acceptable translations for exercises in a language-learning application, and synthesizes work spanning machine translation, MT evaluation, automatic paraphrasing, and language education technology. We developed a novel corpus with unique properties for five languages (Hungarian, Japanese, Korean, Portuguese, and Vietnamese), and report on the results of a shared task challenge which attracted 20 teams to solve the task. In our meta-analysis, we focus on three aspects of the resulting systems: external training corpus selection, model architecture and training decisions, and decoding and filtering strategies. We find that strong systems start with a large amount of generic training data, and then fine-tune with in-domain data, sampled according to our provided learner response frequencies.
2018
Second Language Acquisition Modeling
Burr Settles
|
Chris Brust
|
Erin Gustafson
|
Masato Hagiwara
|
Nitin Madnani
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications
We present the task of second language acquisition (SLA) modeling. Given a history of errors made by learners of a second language, the task is to predict errors that they are likely to make at arbitrary points in the future. We describe a large corpus of more than 7M words produced by more than 6k learners of English, Spanish, and French using Duolingo, a popular online language-learning app. Then we report on the results of a shared task challenge aimed studying the SLA task via this corpus, which attracted 15 teams and synthesized work from various fields including cognitive science, linguistics, and machine learning.
Search
Fix data
Co-authors
- Burr Settles 2
- Klinton Bicknell 1
- Erin Gustafson 1
- Masato Hagiwara 1
- Nitin Madnani 1
- show all...