Deniz Zeyrek


2024

pdf bib
Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)
Duygu Ataman | Mehmet Oguz Derin | Sardana Ivanova | Abdullatif Köksal | Jonne Sälevä | Deniz Zeyrek
Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)

pdf bib
Lightweight Connective Detection Using Gradient Boosting
Mustafa Erolcan Er | Murathan Kurfalı | Deniz Zeyrek
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024

In this work, we introduce a lightweight discourse connective detection system. Employing gradient boosting trained on straightforward, low-complexity features, this proposed approach sidesteps the computational demands of the current approaches that rely on deep neural networks. Considering its simplicity, our approach achieves competitive results while offering significant gains in terms of time even on CPU. Furthermore, the stable performance across two unrelated languages suggests the robustness of our system in the multilingual scenario. The model is designed to support the annotation of discourse relations, particularly in scenarios with limited resources, while minimizing performance loss.

pdf bib
Multiple Discourse Relations in English TED Talks and Their Translation into Lithuanian, Portuguese and Turkish
Deniz Zeyrek | Giedrė Valūnaitė Oleškevičienė | Amalia Mendes
Proceedings of the 17th Workshop on Building and Using Comparable Corpora (BUCC) @ LREC-COLING 2024

2023

pdf bib
Annotating and Disambiguating the Discourse Usage of the Enclitic dA in Turkish
Ebru Ersöyleyen | Deniz Zeyrek | Fırat Öter
Proceedings of the 17th Linguistic Annotation Workshop (LAW-XVII)

The Turkish particle dA is a focus-associated enclitic, and it can act as a discourse connective conveying multiple senses, like additive, contrastive, causal etc. Like many other linguistic expressions, it is subject to usage ambiguity and creates a challenge in natural language automatization tasks. For the first time, we annotate the discourse and non-discourse connnective occurrences of dA in Turkish with the PDTB principles. Using a minimal set of linguistic features, we develop binary classifiers to distinguish its discourse connective usage from its other usages. We show that despite its ability to cliticize to any syntactic type, variable position in the sentence and having a wide argument span, its discourse/non-discourse connective usage can be annotated reliably and its discourse usage can be disambiguated by exploiting local cues.

2020

pdf bib
Turkish Emotion Voice Database (TurEV-DB)
Salih Firat Canpolat | Zuhal Ormanoğlu | Deniz Zeyrek
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

We introduce the Turkish Emotion-Voice Database (TurEV-DB) which involves a corpus of over 1700 tokens based on 82 words uttered by human subjects in four different emotions (angry, calm, happy, sad). Three machine learning experiments are run on the corpus data to classify the emotions using a convolutional neural network (CNN) model and a support vector machine (SVM) model. We report the performance of the machine learning models, and for evaluation, compare machine learning results with the judgements of humans.

pdf bib
TED-MDB Lexicons: Tr-EnConnLex, Pt-EnConnLex
Murathan Kurfalı | Sibel Ozer | Deniz Zeyrek | Amália Mendes
Proceedings of the First Workshop on Computational Approaches to Discourse

In this work, we present two new bilingual discourse connective lexicons, namely, for Turkish-English and European Portuguese-English created automatically using the existing discourse relation-aligned TED-MDB corpus. In their current form, the Pt-En lexicon includes 95 entries, whereas the Tr-En lexicon contains 133 entries. The lexicons constitute the first step of a larger project of developing a multilingual discourse connective lexicon.

2019

pdf bib
TCL - a Lexicon of Turkish Discourse Connectives
Deniz Zeyrek | Kezban Başıbüyük
Proceedings of the First International Workshop on Designing Meaning Representations

It is known that discourse connectives are the most salient indicators of discourse relations. State-of-the-art parsers being developed to predict explicit discourse connectives exploit annotated discourse corpora but a lexicon of discourse connectives is also needed to enable further research in discourse structure and support the development of language technologies that use these structures for text understanding. This paper presents a lexicon of Turkish discourse connectives built by automatic means. The lexicon has the format of the German connective lexicon, DiMLex, where for each discourse connective, information about the connective‘s orthographic variants, syntactic category and senses are provided along with sample relations. In this paper, we describe the data sources we used and the development steps of the lexicon.

bib
An automatic discourse relation alignment experiment on TED-MDB
Sibel Ozer | Deniz Zeyrek
Proceedings of the 2019 Workshop on Widening NLP

This paper describes an automatic discourse relation alignment experiment as an empirical justification of the planned annotation projection approach to enlarge the 3600-word multilingual corpus of TED Multilingual Discourse Bank (TED-MDB). The experiment is carried out on a single language pair (English-Turkish) included in TED-MDB. The paper first describes the creation of a large corpus of English-Turkish bi-sentences, then it presents a sense-based experiment that automatically aligns the relations in the English sentences of TED-MDB with the Turkish sentences. The results are very close to the results obtained from an earlier semi-automatic post-annotation alignment experiment validated by human annotators and are encouraging for future annotation projection tasks.

pdf bib
Proceedings of the 13th Linguistic Annotation Workshop
Annemarie Friedrich | Deniz Zeyrek | Jet Hoek
Proceedings of the 13th Linguistic Annotation Workshop

2018

pdf bib
Multilingual Extension of PDTB-Style Annotation: The Case of TED Multilingual Discourse Bank
Deniz Zeyrek | Amália Mendes | Murathan Kurfalı
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
An Assessment of Explicit Inter- and Intra-sentential Discourse Connectives in Turkish Discourse Bank
Deniz Zeyrek | Murathan Kurfalı
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf bib
TDB 1.1: Extensions on Turkish Discourse Bank
Deniz Zeyrek | Murathan Kurfalı
Proceedings of the 11th Linguistic Annotation Workshop

This paper presents the recent developments on Turkish Discourse Bank (TDB). First, the resource is summarized and an evaluation is presented. Then, TDB 1.1, i.e. enrichments on 10% of the corpus are described (namely, senses for explicit discourse connectives, and new annotations for three discourse relation types - implicit relations, entity relations and alternative lexicalizations). The method of annotation is explained and the data are evaluated.

2016

pdf bib
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Elif Ahsen Acar | Deniz Zeyrek | Murathan Kurfalı | Cem Bozşahin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children’s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.

2014

pdf bib
Turkish Resources for Visual Word Recognition
Begüm Erten | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We report two tools to conduct psycholinguistic experiments on Turkish words. KelimetriK allows experimenters to choose words based on desired orthographic scores of word frequency, bigram and trigram frequency, ON, OLD20, ATL and subset/superset similarity. Turkish version of Wuggy generates pseudowords from one or more template words using an efficient method. The syllabified version of the words are used as the input, which are decomposed into their sub-syllabic components. The bigram frequency chains are constructed by the entire words’ onset, nucleus and coda patterns. Lexical statistics of stems and their syllabification are compiled by us from BOUN corpus of 490 million words. Use of these tools in some experiments is shown.

pdf bib
Annotating Discourse Connectives in Spoken Turkish
Isin Demirşahin | Deniz Zeyrek
Proceedings of LAW VIII - The 8th Linguistic Annotation Workshop

2013

pdf bib
Applicative Structures and Immediate Discourse in the Turkish Discourse Bank
Isin Demirşahin | Adnan Öztürel | Cem Bozşahin | Deniz Zeyrek
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

2012

pdf bib
Pair Annotation: Adaption of Pair Programming to Corpus Annotation
Isin Demirşahin | İhsan Yalcinkaya | Deniz Zeyrek
Proceedings of the Sixth Linguistic Annotation Workshop

pdf bib
METU Turkish Discourse Bank Browser
Utku Şirin | Ruket Çakıcı | Deniz Zeyrek
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, the METU Turkish Discourse Bank Browser, a tool developed for browsing the annotated annotated discourse relations in Middle East Technical University (METU) Turkish Discourse Bank (TDB) project is presented. The tool provides both a clear interface for browsing the annotated corpus and a wide range of search options to analyze the annotations.

2010

pdf bib
Discourse Relation Configurations in Turkish and an Annotation Environment
Berfin Aktaş | Cem Bozsahin | Deniz Zeyrek
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
The Annotation Scheme of the Turkish Discourse Bank and an Evaluation of Inconsistent Annotations
Deniz Zeyrek | Işin Demirşahin | Ayişiği Sevdik-Çalli | Hale Ögel Balaban | İhsan Yalçinkaya | Ümit Deniz Turan
Proceedings of the Fourth Linguistic Annotation Workshop

pdf bib
Unaccusative/Unergative Distinction in Turkish: A Connectionist Approach
Cengiz Acartürk | Deniz Zeyrek
Proceedings of the Eighth Workshop on Asian Language Resouces

2009

pdf bib
Annotating Subordinators in the Turkish Discourse Bank
Deniz Zeyrek | Umit Deniz Turan | Cem Bozsahin | Ruket Cakici | Ayisigi B. Sevdik-Calli | Isin Demirsahin | Berfin Aktas | İhsan Yalcinkaya | Hale Ogel
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

2008

pdf bib
A Discourse Resource for Turkish: Annotating Discourse Connectives in the METU Corpus
Deniz Zeyrek | Bonnie Webber
Proceedings of the 6th Workshop on Asian Language Resources