A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability

Elif Ahsen Acar, Deniz Zeyrek, Murathan Kurfalı, Cem Bozşahin


Abstract
This study primarily aims to build a Turkish psycholinguistic database including three variables: word frequency, age of acquisition (AoA), and imageability, where AoA and imageability information are limited to nouns. We used a corpus-based approach to obtain information about the AoA variable. We built two corpora: a child literature corpus (CLC) including 535 books written for 3-12 years old children, and a corpus of transcribed children’s speech (CSC) at ages 1;4-4;8. A comparison between the word frequencies of CLC and CSC gave positive correlation results, suggesting the usability of the CLC to extract AoA information. We assumed that frequent words of the CLC would correspond to early acquired words whereas frequent words of a corpus of adult language would correspond to late acquired words. To validate AoA results from our corpus-based approach, a rated AoA questionnaire was conducted on adults. Imageability values were collected via a different questionnaire conducted on adults. We conclude that it is possible to deduce AoA information for high frequency words with the corpus-based approach. The results about low frequency words were inconclusive, which is attributed to the fact that corpus-based AoA information is affected by the strong negative correlation between corpus frequency and rated AoA.
Anthology ID:
L16-1571
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3600–3606
Language:
URL:
https://aclanthology.org/L16-1571
DOI:
Bibkey:
Cite (ACL):
Elif Ahsen Acar, Deniz Zeyrek, Murathan Kurfalı, and Cem Bozşahin. 2016. A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 3600–3606, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Turkish Database for Psycholinguistic Studies Based on Frequency, Age of Acquisition, and Imageability (Acar et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1571.pdf