From Lemmas to Dependencies: What Signals Drive Light Verbs Classification?

Sercan Karakas; Yusuf Şimşek

From Lemmas to Dependencies: What Signals Drive Light Verbs Classification?

Abstract

Light verb constructions (LVCs) are a challenging class of verbal multiword expressions, especially in Turkish,where rich morphology and productive complex predicates create minimal contrasts between idiomatic predicatemeanings and literal verb–argument uses. This paper asks what signals drive LVC classification bysystematically restricting model inputs. Using UD-derived supervision, we compare lemma-driven baselines(lemma TF–IDF + Logistic Regression; BERTurk trained on lemma sequences), a grammar-only Logistic Regressionover UD morphosyntax (UPOS/DEPREL/MORPH), and a full-input BERTurk baseline. We evaluate on a controlleddiagnostic set with Random negatives, lexical controls (NLVC), and LVC positives, reporting split-wiseperformance to expose decision-boundary behavior. Results show that coarse morphosyntax alone is insufficientfor robust LVC detection under controlled contrasts, while lexical identity supports LVC judgments but issensitive to calibration and normalization choices. Overall, our findings motivate targeted evaluation forTurkish MWEs and highlight that “lemma-only” is not a single representation but depends critically on hownormalization is instantiated.

Anthology ID:: 2026.sigturk-1.18
Volume:: Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Month:: March
Year:: 2026
Address:: Rabat, Morocco
Editors:: Kemal Oflazer, Abdullatif Köksal, Onur Varol
Venues:: SIGTURK | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 220–227
Language:
URL:: https://aclanthology.org/2026.sigturk-1.18/
DOI:
Bibkey:
Cite (ACL):: Sercan Karakas and Yusuf Şimşek. 2026. From Lemmas to Dependencies: What Signals Drive Light Verbs Classification?. In Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026), pages 220–227, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):: From Lemmas to Dependencies: What Signals Drive Light Verbs Classification? (Karakas & Şimşek, SIGTURK 2026)
Copy Citation:
PDF:: https://aclanthology.org/2026.sigturk-1.18.pdf

PDF Cite Search Fix data