Bo Blankers
2017
The Power of Character N-grams in Native Language Identification
Artur Kulmizev
|
Bo Blankers
|
Johannes Bjerva
|
Malvina Nissim
|
Gertjan van Noord
|
Barbara Plank
|
Martijn Wieling
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
In this paper, we explore the performance of a linear SVM trained on language independent character features for the NLI Shared Task 2017. Our basic system (GRONINGEN) achieves the best performance (87.56 F1-score) on the evaluation set using only 1-9 character n-grams as features. We compare this against several ensemble and meta-classifiers in order to examine how the linear system fares when combined with other, especially non-linear classifiers. Special emphasis is placed on the topic bias that exists by virtue of the assessment essay prompt distribution.
Search
Fix data
Co-authors
- Johannes Bjerva 1
- Artur Kulmizev 1
- Malvina Nissim 1
- Barbara Plank 1
- Martijn Wieling 1
- show all...
Venues
- bea1