CoSSAT: Code-Switched Speech Annotation Tool

Sanket Shah, Pratik Joshi, Sebastin Santy, Sunayana Sitaram


Abstract
Code-switching refers to the alternation of two or more languages in a conversation or utterance and is common in multilingual communities across the world. Building code-switched speech and natural language processing systems are challenging due to the lack of annotated speech and text data. We present a speech annotation interface CoSSAT, which helps annotators transcribe code-switched speech faster, more easily and more accurately than a traditional interface, by displaying candidate words from monolingual speech recognizers. We conduct a user study on the transcription of Hindi-English code-switched speech with 10 annotators and describe quantitative and qualitative results.
Anthology ID:
D19-5907
Volume:
Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP
Month:
November
Year:
2019
Address:
Hong Kong
Venues:
EMNLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
48–52
Language:
URL:
https://aclanthology.org/D19-5907
DOI:
10.18653/v1/D19-5907
Bibkey:
Cite (ACL):
Sanket Shah, Pratik Joshi, Sebastin Santy, and Sunayana Sitaram. 2019. CoSSAT: Code-Switched Speech Annotation Tool. In Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP, pages 48–52, Hong Kong. Association for Computational Linguistics.
Cite (Informal):
CoSSAT: Code-Switched Speech Annotation Tool (Shah et al., EMNLP 2019)
Copy Citation:
PDF:
https://aclanthology.org/D19-5907.pdf
Attachment:
 D19-5907.Attachment.pdf