The CLARIN Knowledge Centre for Atypical Communication Expertise
Henk van den Heuvel | Nelleke Oostdijk | Caroline Rowland | Paul Trilsbeek
Proceedings of the Twelfth Language Resources and Evaluation Conference
This paper introduces a new CLARIN Knowledge Center which is the K-Centre for Atypical Communication Expertise (ACE for short) which has been established at the Centre for Language and Speech Technology (CLST) at Radboud University. Atypical communication is an umbrella term used here to denote language use by second language learners, people with language disorders or those suffering from language disabilities, but also more broadly by bilinguals and users of sign languages. It involves multiple modalities (text, speech, sign, gesture) and encompasses different developmental stages. ACE closely collaborates with The Language Archive (TLA) at the Max Planck Institute for Psycholinguistics in order to safeguard GDPR-compliant data storage and access. We explain the mission of ACE and show its potential on a number of showcases and a use case.
Continuous vector word representations (or word embeddings) have shown success in capturing semantic relations between words, as evidenced with evaluation against behavioral data of adult performance on semantic tasks (Pereira et al. 2016). Adult semantic knowledge is the endpoint of a language acquisition process; thus, a relevant question is whether these models can also capture emerging word representations of young language learners. However, the data of semantic knowledge of children is scarce or non-existent for some age groups. In this paper, we propose to bridge this gap by using Age of Acquisition norms to evaluate word embeddings learnt from child-directed input. We present two methods that evaluate word embeddings in terms of (a) the semantic neighbourhood density of learnt words, and (b) the convergence to adult word associations. We apply our methods to bag-of-words models, and we find that (1) children acquire words with fewer semantic neighbours earlier, and (2) young learners only attend to very local context. These findings provide converging evidence for validity of our methods in understanding the prerequisite features for a distributional model of word learning.