Stacked Sentence-Document Classifier Approach for Improving Native Language Identification

Andrea Cimino, Felice Dell’Orletta


Abstract
In this paper, we describe the approach of the ItaliaNLP Lab team to native language identification and discuss the results we submitted as participants to the essay track of NLI Shared Task 2017. We introduce for the first time a 2-stacked sentence-document architecture for native language identification that is able to exploit both local sentence information and a wide set of general-purpose features qualifying the lexical and grammatical structure of the whole document. When evaluated on the official test set, our sentence-document stacked architecture obtained the best result among all the participants of the essay track with an F1 score of 0.8818.
Anthology ID:
W17-5049
Volume:
Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Joel Tetreault, Jill Burstein, Claudia Leacock, Helen Yannakoudakis
Venue:
BEA
SIG:
SIGEDU
Publisher:
Association for Computational Linguistics
Note:
Pages:
430–437
Language:
URL:
https://aclanthology.org/W17-5049
DOI:
10.18653/v1/W17-5049
Bibkey:
Cite (ACL):
Andrea Cimino and Felice Dell’Orletta. 2017. Stacked Sentence-Document Classifier Approach for Improving Native Language Identification. In Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications, pages 430–437, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Stacked Sentence-Document Classifier Approach for Improving Native Language Identification (Cimino & Dell’Orletta, BEA 2017)
Copy Citation:
PDF:
https://aclanthology.org/W17-5049.pdf