Multiword Expressions in Child Language

Rodrigo Wilkens, Marco Idiart, Aline Villavicencio


Abstract
The goal of this work is to introduce CHILDES-MWE, which contains English CHILDES corpora automatically annotated with Multiword Expressions (MWEs) information. The result is a resource with almost 350,000 sentences annotated with more than 70,000 distinct MWEs of various types from both longitudinal and latitudinal corpora. This resource can be used for large scale language acquisition studies of how MWEs feature in child language. Focusing on compound nouns (CN), we then verify in a longitudinal study if there are differences in the distribution and compositionality of CNs in child-directed and child-produced sentences across ages. Moreover, using additional latitudinal data, we investigate if there are further differences in CN usage and in compositionality preferences. The results obtained for the child-produced sentences reflect CN distribution and compositionality in child-directed sentences.
Anthology ID:
L16-1365
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2307–2311
Language:
URL:
https://aclanthology.org/L16-1365
DOI:
Bibkey:
Cite (ACL):
Rodrigo Wilkens, Marco Idiart, and Aline Villavicencio. 2016. Multiword Expressions in Child Language. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 2307–2311, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Multiword Expressions in Child Language (Wilkens et al., LREC 2016)
Copy Citation:
PDF:
https://aclanthology.org/L16-1365.pdf