Towards More Natural Artificial Languages

Mark Hopkins


Abstract
A number of papers have recently argued in favor of using artificially generated languages to investigate the inductive biases of linguistic models, or to develop models for low-resource languages with underrepresented typologies. But the promise of artificial languages comes with a caveat: if these artificial languages are not sufficiently reflective of natural language, then using them as a proxy may lead to inaccurate conclusions. In this paper, we take a step towards increasing the realism of artificial language by introducing a variant of indexed grammars that draw their weights from hierarchical Pitman-Yor processes. We show that this framework generates languages that emulate the statistics of natural language corpora better than the current approach of directly formulating weighted context-free grammars.
Anthology ID:
2022.conll-1.7
Volume:
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)
Month:
December
Year:
2022
Address:
Abu Dhabi, United Arab Emirates (Hybrid)
Editors:
Antske Fokkens, Vivek Srikumar
Venue:
CoNLL
SIG:
SIGNLL
Publisher:
Association for Computational Linguistics
Note:
Pages:
85–94
Language:
URL:
https://aclanthology.org/2022.conll-1.7
DOI:
10.18653/v1/2022.conll-1.7
Bibkey:
Cite (ACL):
Mark Hopkins. 2022. Towards More Natural Artificial Languages. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 85–94, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
Cite (Informal):
Towards More Natural Artificial Languages (Hopkins, CoNLL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.conll-1.7.pdf
Video:
 https://aclanthology.org/2022.conll-1.7.mp4