Transformed Protoform Reconstruction

Young Min Kim, Kalvin Chang, Chenxuan Cui, David R. Mortensen


Abstract
Protoform reconstruction is the task of inferring what morphemes or words appeared like in the ancestral languages of a set of daughter languages. Meloni et al (2021) achieved the state-of-the-art on Latin protoform reconstruction with an RNN-based encoder-decoder with attention model. We update their model with the state-of-the-art seq2seq model: the Transformer. Our model outperforms their model on a suite of different metrics on two different datasets: their Romance data of 8,000 cognates spanning 5 languages and a Chinese dataset (Hou 2004) of 800+ cognates spanning 39 varieties. We also probe our model for potential phylogenetic signal contained in the model. Our code is publicly available at https://github.com/cmu-llab/acl-2023.
Anthology ID:
2023.acl-short.3
Volume:
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Month:
July
Year:
2023
Address:
Toronto, Canada
Editors:
Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
24–38
Language:
URL:
https://aclanthology.org/2023.acl-short.3
DOI:
10.18653/v1/2023.acl-short.3
Bibkey:
Cite (ACL):
Young Min Kim, Kalvin Chang, Chenxuan Cui, and David R. Mortensen. 2023. Transformed Protoform Reconstruction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 24–38, Toronto, Canada. Association for Computational Linguistics.
Cite (Informal):
Transformed Protoform Reconstruction (Kim et al., ACL 2023)
Copy Citation:
PDF:
https://aclanthology.org/2023.acl-short.3.pdf
Video:
 https://aclanthology.org/2023.acl-short.3.mp4