A Turkish-German Code-Switching Corpus

Özlem Çetinoğlu


Abstract
Bilingual communities often alternate between languages both in spoken and written communication. One such community, Germany residents of Turkish origin produce Turkish-German code-switching, by heavily mixing two languages at discourse, sentence, or word level. Code-switching in general, and Turkish-German code-switching in particular, has been studied for a long time from a linguistic perspective. Yet resources to study them from a more computational perspective are limited due to either small size or licence issues. In this work we contribute the solution of this problem with a corpus. We present a Turkish-German code-switching corpus which consists of 1029 tweets, with a majority of intra-sentential switches. We share different type of code-switching we have observed in our collection and describe our processing steps. The first step is data collection and filtering. This is followed by manual tokenisation and normalisation. And finally, we annotate data with word-level language identification information. The resulting corpus is available for research purposes.
Anthology ID:
L16-1667
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
4215–4220
Language:
URL:
https://aclanthology.org/L16-1667
DOI:
Bibkey:
Cite (ACL):
Özlem Çetinoğlu. 2016. A Turkish-German Code-Switching Corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 4215–4220, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Turkish-German Code-Switching Corpus (Çetinoğlu, LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1667.pdf