Ali Gebeşçe
2026
Overview of the SIGTURK 2026 Shared Task: Terminology-Aware Machine Translation for English–Turkish Scientific Texts
Ali Gebeşçe | Abdulfattah Safa | Ege Uğur Amasya | Gözde Gül Şahin
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Ali Gebeşçe | Abdulfattah Safa | Ege Uğur Amasya | Gözde Gül Şahin
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
This paper presents an overview of the SIGTURK 2026 Shared Task on Terminology-Aware Machine Translation for English-Turkish Scientific Texts. We address the critical challenge of terminological accuracy in low-resource settings by constructing the first terminology-rich English-Turkish parallel corpus, comprising 3,300 sentence pairs from STEM domains with 10,157 expert-validated term pairs. The shared task consists of three subtasks: term detection, expert-guided correction, and end-to-end post-editing. We evaluate state-of-the-art baselines (including GPT-5.2 and Claude Sonnet 4.5) alongside participant systems employing diverse strategies from fine-tuning to Retrieval-Augmented Generation (RAG). Our results highlight that while massive generalist models dominate zero-shot detection, smaller, domain-adapted models using Supervised Fine-Tuning and Reinforcement Learning can significantly outperform them in end-to-end post-editing. Furthermore, we find that rigid retrieval pipelines often disrupt fluency, whereas Chain-of-Thought prompting allows models to integrate terminology more naturally. Despite these advances, a significant gap remains between automated systems and human expert performance in strict terminology correction.
2025
GECTurk WEB: An Explainable Online Platform for Turkish Grammatical Error Detection and Correction
Ali Gebeşçe | Gözde Gül Şahin
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations
Ali Gebeşçe | Gözde Gül Şahin
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations
Sophisticated grammatical error detection/correction tools are available for a small set of languages such as English and Chinese. However, it is not straightforward—if not impossible—to adapt them to morphologically rich languages with complex writing rules like Turkish which has more than 80 million speakers. Even though several tools exist for Turkish, they primarily focus on spelling errors rather than grammatical errors and lack features such as web interfaces, error explanations and feedback mechanisms. To fill this gap, we introduce GECTurk WEB, a light, open-source, and flexible web-based system that can detect and correct the most common forms of Turkish writing errors, such as the misuse of diacritics, compound and foreign words, pronouns, light verbs along with spelling mistakes. Our system provides native speakers and second language learners an easily accessible tool to detect/correct such mistakes and also to learn from their mistakes by showing the explanation for the violated rule(s). The proposed system achieves 88,3 system usability score, and is shown to help learn/remember a grammatical rule (confirmed by 80% of the participants). The GECTurk WEB is available both as an offline tool (https://github.com/GGLAB-KU/gecturkweb) or at www.gecturk.net.