Kemal Oflazer

2026

Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)
Kemal Oflazer | Abdullatif Köksal | Onur Varol
Proceedings of the Second Workshop Natural Language Processing for Turkic Languages (SIGTURK 2026)

2024

pdf bib abs

Nullpointer at ArAIEval Shared Task: Arabic Propagandist Technique Detection with Token-to-Word Mapping in Sequence Tagging
Abrar Abir | Kemal Oflazer
Proceedings of the Second Arabic Natural Language Processing Conference

This paper investigates the optimization of propaganda technique detection in Arabic text, including tweets & news paragraphs, from ArAIEval shared task 1. Our approach involves fine-tuning the AraBERT v2 model with a neural network classifier for sequence tagging.Experimental results show relying on the first token of the word for technique prediction produces the best performance. In addition, incorporating genre information as a feature further enhances the model’s performance. Our system achieved a score of 25.41, placing us 4th on the leaderboard. Subsequent post-submission improvements further raised our score to 26.68.

2023

pdf bib abs

Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko’s (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results—through the lens of morphology—cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.

2021

pdf bib abs

Semantic Similarity Based Evaluation for Abstractive News Summarization
Figen Beken Fikri | Kemal Oflazer | Berrin Yanikoglu
Proceedings of the First Workshop on Natural Language Generation, Evaluation, and Metrics (GEM)

ROUGE is a widely used evaluation metric in text summarization. However, it is not suitable for the evaluation of abstractive summarization systems as it relies on lexical overlap between the gold standard and the generated summaries. This limitation becomes more apparent for agglutinative languages with very large vocabularies and high type/token ratios. In this paper, we present semantic similarity models for Turkish and apply them as evaluation metrics for an abstractive summarization task. To achieve this, we translated the English STSb dataset into Turkish and presented the first semantic textual similarity dataset for Turkish as well. We showed that our best similarity models have better alignment with average human judgments compared to ROUGE in both Pearson and Spearman correlations.

Kemal Oflazer

2026

2024

2023

2021

2018

2016

2015

2014

2013

2012

2010

2008

2007

2006

2005

2004

2003

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

Co-authors

Venues