String Transduction with Target Language Models and Insertion Handling

Garrett Nicolai, Saeed Najafi, Grzegorz Kondrak


Abstract
Many character-level tasks can be framed as sequence-to-sequence transduction, where the target is a word from a natural language. We show that leveraging target language models derived from unannotated target corpora, combined with a precise alignment of the training data, yields state-of-the art results on cognate projection, inflection generation, and phoneme-to-grapheme conversion.
Anthology ID:
W18-5805
Volume:
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology
Month:
October
Year:
2018
Address:
Brussels, Belgium
Editors:
Sandra Kuebler, Garrett Nicolai
Venue:
EMNLP
SIG:
SIGMORPHON
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–53
Language:
URL:
https://aclanthology.org/W18-5805
DOI:
10.18653/v1/W18-5805
Bibkey:
Cite (ACL):
Garrett Nicolai, Saeed Najafi, and Grzegorz Kondrak. 2018. String Transduction with Target Language Models and Insertion Handling. In Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 43–53, Brussels, Belgium. Association for Computational Linguistics.
Cite (Informal):
String Transduction with Target Language Models and Insertion Handling (Nicolai et al., EMNLP 2018)
Copy Citation:
PDF:
https://aclanthology.org/W18-5805.pdf