Ljubomir Popović
2016
How to Differentiate the Closely Related Standard Languages?
Duško Vitas
|
Ljubomir Popović
|
Cvetana Krstev
|
Anđelka Zečević
Proceedings of the Second International Conference on Computational Linguistics in Bulgaria (CLIB 2016)
In this paper the adequacy of the SETimes corpus as a basis for the comparison of closely related languages that are used in countries that emerged after the breakup of Yugoslavia is discussed by comparing it with other corpora. It is shown that the phenomena observed in this corpus and used to illustrate differences most specifically between Serbian and Croatian are consistent neither with their standards nor with other sources. Thus, results obtained on the basis of the SETimes corpus are corpus-biased and have to be reconsidered. This proves that the size of a corpus and its composition used in a linguistic research are crucial for assessing the obtained results.