Caroline Williams


The Cambridge Cookie-Theft Corpus: A Corpus of Directed and Spontaneous Speech of Brain-Damaged Patients and Healthy Individuals
Caroline Williams | Andrew Thwaites | Paula Buttery | Jeroen Geertzen | Billi Randall | Meredith Shafto | Barry Devereux | Lorraine Tyler
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Investigating differences in linguistic usage between individuals who have suffered brain injury (hereafter patients) and those who haven’t can yield a number of benefits. It provides a better understanding about the precise way in which impairments affect patients’ language, improves theories of how the brain processes language, and offers heuristics for diagnosing certain types of brain damage based on patients’ speech. One method for investigating usage differences involves the analysis of spontaneous speech. In the work described here we construct a text corpus consisting of transcripts of individuals’ speech produced during two tasks: the Boston-cookie-theft picture description task (Goodglass and Kaplan, 1983) and a spontaneous speech task, which elicits a semi-prompted monologue, and/or free speech. Interviews with patients from 19yrs to 89yrs were transcribed, as were interviews with a comparable number of healthy individuals (20yrs to 89yrs). Structural brain images are available for approximately 30% of participants. This unique data source provides a rich resource for future research in many areas of language impairment and has been constructed to facilitate analysis with natural language processing and corpus linguistics techniques.