How to Differentiate the Closely Related Standard Languages?

Duško Vitas, Ljubomir Popović, Cvetana Krstev, Anđelka Zečević


Abstract
In this paper the adequacy of the SETimes corpus as a basis for the comparison of closely related languages that are used in countries that emerged after the breakup of Yugoslavia is discussed by comparing it with other corpora. It is shown that the phenomena observed in this corpus and used to illustrate differences most specifically between Serbian and Croatian are consistent neither with their standards nor with other sources. Thus, results obtained on the basis of the SETimes corpus are corpus-biased and have to be reconsidered. This proves that the size of a corpus and its composition used in a linguistic research are crucial for assessing the obtained results.
Anthology ID:
2016.clib-1.1
Volume:
Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016)
Month:
September
Year:
2016
Address:
Sofia, Bulgaria
Venue:
CLIB
SIG:
Publisher:
Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences
Note:
Pages:
1–10
Language:
URL:
https://aclanthology.org/2016.clib-1.1
DOI:
Bibkey:
Cite (ACL):
Duško Vitas, Ljubomir Popović, Cvetana Krstev, and Anđelka Zečević. 2016. How to Differentiate the Closely Related Standard Languages?. In Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016), pages 1–10, Sofia, Bulgaria. Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences.
Cite (Informal):
How to Differentiate the Closely Related Standard Languages? (Vitas et al., CLIB 2016)
Copy Citation:
PDF:
https://aclanthology.org/2016.clib-1.1.pdf