Serhiy Bykh
2016
Advancing Linguistic Features and Insights by Label-informed Feature Grouping: An Exploration in the Context of Native Language Identification
Serhiy Bykh | Detmar Meurers
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
Serhiy Bykh | Detmar Meurers
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
We propose a hierarchical clustering approach designed to group linguistic features for supervised machine learning that is inspired by variationist linguistics. The method makes it possible to abstract away from the individual feature occurrences by grouping features together that behave alike with respect to the target class, thus providing a new, more general perspective on the data. On the one hand, it reduces data sparsity, leading to quantitative performance gains. On the other, it supports the formation and evaluation of hypotheses about individual choices of linguistic structures. We explore the method using features based on verb subcategorization information and evaluate the approach in the context of the Native Language Identification (NLI) task.
2014
Exploring Syntactic Features for Native Language Identification: A Variationist Perspective on Feature Encoding and Ensemble Optimization
Serhiy Bykh | Detmar Meurers
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
Serhiy Bykh | Detmar Meurers
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
2013
Combining Shallow and Linguistically Motivated Features in Native Language Identification
Serhiy Bykh | Sowmya Vajjala | Julia Krivanek | Detmar Meurers
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications
Serhiy Bykh | Sowmya Vajjala | Julia Krivanek | Detmar Meurers
Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications