Serhiy Bykh


2016

pdf bib
Advancing Linguistic Features and Insights by Label-informed Feature Grouping: An Exploration in the Context of Native Language Identification
Serhiy Bykh | Detmar Meurers
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

We propose a hierarchical clustering approach designed to group linguistic features for supervised machine learning that is inspired by variationist linguistics. The method makes it possible to abstract away from the individual feature occurrences by grouping features together that behave alike with respect to the target class, thus providing a new, more general perspective on the data. On the one hand, it reduces data sparsity, leading to quantitative performance gains. On the other, it supports the formation and evaluation of hypotheses about individual choices of linguistic structures. We explore the method using features based on verb subcategorization information and evaluate the approach in the context of the Native Language Identification (NLI) task.

2014

pdf bib
Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization
Serhiy Bykh | Detmar Meurers
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

2013

pdf bib
Combining Shallow and Linguistically Motivated Features in Native Language Identification
Serhiy Bykh | Sowmya Vajjala | Julia Krivanek | Detmar Meurers
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications

2012

pdf bib
Native Language Identification using Recurring n-grams – Investigating Abstraction and Domain Dependence
Serhiy Bykh | Detmar Meurers
Proceedings of COLING 2012