A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability
Elif Ahsen Acar
Deniz Zeyrek
Murathan Kurfalı
Cem Bozşahin
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children’s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.
Turkish Resources for Visual Word Recognition
Begüm Erten
Cem Bozsahin
Deniz Zeyrek
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We report two tools to conduct psycholinguistic experiments on Turkish words. KelimetriK allows experimenters to choose words based on desired orthographic scores of word frequency, bigram and trigram frequency, ON, OLD20, ATL and subset/superset similarity. Turkish version of Wuggy generates pseudowords from one or more template words using an efficient method. The syllabified version of the words are used as the input, which are decomposed into their sub-syllabic components. The bigram frequency chains are constructed by the entire words’ onset, nucleus and coda patterns. Lexical statistics of stems and their syllabification are compiled by us from BOUN corpus of 490 million words. Use of these tools in some experiments is shown.
Applicative Structures and Immediate Discourse in the Turkish Discourse Bank
Isin Demirşahin
Adnan Öztürel
Cem Bozşahin
Deniz Zeyrek
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse
Discourse Relation Configurations in Turkish and an Annotation Environment
Berfin Aktaş
Cem Bozsahin
Deniz Zeyrek
Proceedings of the Fourth Linguistic Annotation Workshop
Annotating Subordinators in the Turkish Discourse Bank
Deniz Zeyrek
Umit Deniz Turan
Cem Bozsahin
Ruket Cakici
Ayisigi B. Sevdik-Calli
Isin Demirsahin
Berfin Aktas
İhsan Yalcinkaya
Hale Ogel
Proceedings of the Third Linguistic Annotation Workshop (LAW III)
The Combinatory Morphemic Lexicon
Cem Bozsahin
Computational Linguistics, Volume 28, Number 2, June 2002
Deriving the Predicate-Argument Structure for a Free Word Order Language
Cem Bozsahin
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics
Deriving the Predicate-Argument Structure for a Free Word Order Language
Cem Bozsahin
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1
Morphological Productivity in the Lexicon
Onur T. Sehitoglu
H. Cem Bozsahin
Breadth and Depth of Semantic Lexicons