From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology

Mark Dingemanse, Andreas Liesenfeld


Abstract
Informal social interaction is the primordial home of human language. Linguistically diverse conversational corpora are an important and largely untapped resource for computational linguistics and language technology. Through the efforts of a worldwide language documentation movement, such corpora are increasingly becoming available. We show how interactional data from 63 languages (26 families) harbours insights about turn-taking, timing, sequential structure and social action, with implications for language technology, natural language understanding, and the design of conversational interfaces. Harnessing linguistically diverse conversational corpora will provide the empirical foundations for flexible, localizable, humane language technologies of the future.
Anthology ID:
2022.acl-long.385
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
5614–5633
Language:
URL:
https://aclanthology.org/2022.acl-long.385
DOI:
10.18653/v1/2022.acl-long.385
Bibkey:
Cite (ACL):
Mark Dingemanse and Andreas Liesenfeld. 2022. From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5614–5633, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
From text to talk: Harnessing conversational corpora for humane and diversity-aware language technology (Dingemanse & Liesenfeld, ACL 2022)
Copy Citation:
PDF:
https://aclanthology.org/2022.acl-long.385.pdf
Video:
 https://aclanthology.org/2022.acl-long.385.mp4