Anne-Catherine Bachoud-Lévi

Also published as: Anne-Catherine Bachoud-Levi


pdf bib
A comparison study on patient-psychologist voice diarization
Rachid Riad | Hadrien Titeux | Laurie Lemoine | Justine Montillot | Agnes Sliwinski | Jennifer Bagnou | Xuan Cao | Anne-Catherine Bachoud-Levi | Emmanuel Dupoux
Ninth Workshop on Speech and Language Processing for Assistive Technologies (SLPAT-2022)

Conversations between a clinician and a patient, in natural conditions, are valuable sources of information for medical follow-up. The automatic analysis of these dialogues could help extract new language markers and speed up the clinicians’ reports. Yet, it is not clear which model is the most efficient to detect and identify the speaker turns, especially for individuals with speech disorders. Here, we proposed a split of the data that allows conducting a comparative evaluation of different diarization methods. We designed and trained end-to-end neural network architectures to directly tackle this task from the raw signal and evaluate each approach under the same metric. We also studied the effect of fine-tuning models to find the best performance. Experimental results are reported on naturalistic clinical conversations between Psychologists and Interviewees, at different stages of Huntington’s disease, displaying a large panel of speech disorders. We found out that our best end-to-end model achieved 19.5 % IER on the test set, compared to 23.6% achieved by the finetuning of the X-vector architecture. Finally, we observed that we could extract clinical markers directly from the automatic systems, highlighting the clinical relevance of our methods.


pdf bib
Identification of Primary and Collateral Tracks in Stuttered Speech
Rachid Riad | Anne-Catherine Bachoud-Lévi | Frank Rudzicz | Emmanuel Dupoux
Proceedings of the 12th Language Resources and Evaluation Conference

Disfluent speech has been previously addressed from two main perspectives: the clinical perspective focusing on diagnostic, and the Natural Language Processing (NLP) perspective aiming at modeling these events and detect them for downstream tasks. In addition, previous works often used different metrics depending on whether the input features are text or speech, making it difficult to compare the different contributions. Here, we introduce a new evaluation framework for disfluency detection inspired by the clinical and NLP perspective together with the theory of performance from (Clark, 1996) which distinguishes between primary and collateral tracks. We introduce a novel forced-aligned disfluency dataset from a corpus of semi-directed interviews, and present baseline results directly comparing the performance of text-based features (word and span information) and speech-based (acoustic-prosodic information). Finally, we introduce new audio features inspired by the word-based span features. We show experimentally that using these features outperformed the baselines for speech-based predictions on the present dataset.

pdf bib
Seshat: a Tool for Managing and Verifying Annotation Campaigns of Audio Data
Hadrien Titeux | Rachid Riad | Xuan-Nga Cao | Nicolas Hamilakis | Kris Madden | Alejandrina Cristia | Anne-Catherine Bachoud-Lévi | Emmanuel Dupoux
Proceedings of the 12th Language Resources and Evaluation Conference

We introduce Seshat, a new, simple and open-source software to efficiently manage annotations of speech corpora. The Seshat software allows users to easily customise and manage annotations of large audio corpora while ensuring compliance with the formatting and naming conventions of the annotated output files. In addition, it includes procedures for checking the content of annotations following specific rules that can be implemented in personalised parsers. Finally, we propose a double-annotation mode, for which Seshat computes automatically an associated inter-annotator agreement with the gamma measure taking into account the categorisation and segmentation discrepancies.