Meeting the 2020 Duolingo Challenge on a Shoestring

Tadashi Nomoto


Abstract
What is given below is a brief description of the two systems, called gFCONV and c-VAE, which we built in a response to the 2020 Duolingo Challenge. Both are neural models that aim at disrupting a sentence representation the encoder generates with an eye on increasing the diversity of sentences that emerge out of the process. Importantly, we decided not to turn to external sources for extra ammunition, curious to know how far we can go while confining ourselves to the data released by Duolingo. gFCONV works by taking over a pre-trained sequence model, and intercepting the output its encoder produces on its way to the decoder. c-VAE is a conditional variational auto-encoder, seeking the diversity by blurring the representation that the encoder derives. Experiments on a corpus constructed out of the public dataset from Duolingo, containing some 4 million pairs of sentences, found that gFCONV is a consistent winner over c-VAE though both suffered heavily from a low recall.
Anthology ID:
2020.ngt-1.14
Volume:
Proceedings of the Fourth Workshop on Neural Generation and Translation
Month:
July
Year:
2020
Address:
Online
Venues:
ACL | NGT | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
129–133
Language:
URL:
https://aclanthology.org/2020.ngt-1.14
DOI:
10.18653/v1/2020.ngt-1.14
Bibkey:
Cite (ACL):
Tadashi Nomoto. 2020. Meeting the 2020 Duolingo Challenge on a Shoestring. In Proceedings of the Fourth Workshop on Neural Generation and Translation, pages 129–133, Online. Association for Computational Linguistics.
Cite (Informal):
Meeting the 2020 Duolingo Challenge on a Shoestring (Nomoto, NGT 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.ngt-1.14.pdf
Video:
 http://slideslive.com/38929828
Data
Duolingo STAPLE Shared Task