CIC-FBK Approach to Native Language Identification

Ilia Markov, Lingzhen Chen, Carlo Strapparava, Grigori Sidorov


Abstract
We present the CIC-FBK system, which took part in the Native Language Identification (NLI) Shared Task 2017. Our approach combines features commonly used in previous NLI research, i.e., word n-grams, lemma n-grams, part-of-speech n-grams, and function words, with recently introduced character n-grams from misspelled words, and features that are novel in this task, such as typed character n-grams, and syntactic n-grams of words and of syntactic relation tags. We use log-entropy weighting scheme and perform classification using the Support Vector Machines (SVM) algorithm. Our system achieved 0.8808 macro-averaged F1-score and shared the 1st rank in the NLI Shared Task 2017 scoring.
Anthology ID:
W17-5042
Volume:
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Joel Tetreault, Jill Burstein, Claudia Leacock, Helen Yannakoudakis
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
374–381
Language:
URL:
https://aclanthology.org/W17-5042/
DOI:
10.18653/v1/W17-5042
Bibkey:
Cite (ACL):
Ilia Markov, Lingzhen Chen, Carlo Strapparava, and Grigori Sidorov. 2017. CIC-FBK Approach to Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 374–381, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
CIC-FBK Approach to Native Language Identification (Markov et al., BEA 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5042.pdf