Pedro Vernetti


2020

bib
An Assessment of Language Identification Methods on Tweets and Wikipedia Articles
Pedro Vernetti | Larissa Freitas
Proceedings of the Fourth Widening Natural Language Processing Workshop

Language identification is the task of determining the language which a given text is written. This task is important for Natural Language Processing and Information Retrieval activities. Two popular approaches for language identification are the N-grams and stopwords models. In this paper, these two models were tested on different types of documents such as short, irregular texts (tweets) and long, regular texts (Wikipedia articles).
Search
Co-authors
Venues