Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language

Peter A. Jansen, Jordan Boyd-Graber


Abstract
Tamarian, a fictional language introduced in the Star Trek episode Darmok, communicates meaning through utterances of metaphorical references, such as “Darmok and Jalad at Tanagra” instead of “We should work together.” This work assembles a Tamarian-English dictionary of utterances from the original episode and several follow-on novels, and uses this to construct a parallel corpus of 456 English-Tamarian utterances. A machine translation system based on a large language model (T5) is trained using this parallel corpus, and is shown to produce an accuracy of 76% when translating from English to Tamarian on known utterances.
Anthology ID:
2022.flp-1.5
Volume:
Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Debanjan Ghosh, Beata Beigman Klebanov, Smaranda Muresan, Anna Feldman, Soujanya Poria, Tuhin Chakrabarty
Venue:
Fig-Lang
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
34–38
Language:
URL:
https://aclanthology.org/2022.flp-1.5
DOI:
10.18653/v1/2022.flp-1.5
Bibkey:
Cite (ACL):
Peter A. Jansen and Jordan Boyd-Graber. 2022. Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language. In Proceedings of the 3rd Workshop on Figurative Language Processing (FLP), pages 34–38, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language (Jansen & Boyd-Graber, Fig-Lang 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.flp-1.5.pdf
Video:
 https://aclanthology.org/2022.flp-1.5.mp4