2009
pdf
bib
abs
Segmentation et classification non supervisée de conversations téléphoniques automatiquement retranscrites
Laurent Bozzi
|
Philippe Suignard
|
Claire Waast-Richard
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles courts
Cette étude porte sur l’analyse de conversations entre des clients et des téléconseillers d’EDF. Elle propose une chaîne de traitements permettant d’automatiser la détection des sujets abordés dans chaque conversation. L’aspect multi-thématique des conversations nous incite à trouver une unité de documents entre le simple tour de parole et la conversation entière. Cette démarche enchaîne une étape de segmentation de la conversation en thèmes homogènes basée sur la notion de cohésion lexicale, puis une étape de text-mining comportant une analyse linguistique enrichie d’un vocabulaire métier spécifique à EDF, et enfin une classification non supervisée des segments obtenus. Plusieurs algorithmes de segmentation ont été évalués sur un corpus de test, segmenté et annoté manuellement : le plus « proche » de la segmentation de référence est C99. Cette démarche, appliquée à la fois sur un corpus de conversations transcrites à la main, et sur les mêmes conversations décodées par un moteur de reconnaissance vocale, aboutit quasiment à l’obtention des 20 mêmes classes thématiques.
2008
pdf
bib
abs
CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content
Martine Garnier-Rizet
|
Gilles Adda
|
Frederik Cailliau
|
Sylvie Guillemin-Lanne
|
Claire Waast-Richard
|
Lori Lamel
|
Stephan Vanni
|
Claire Waast-Richard
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Being the clients first interface, call centres worldwide contain a huge amount of information of all kind under the form of conversational speech. If accessible, this information can be used to detect eg. major events and organizational flaws, improve customer relations and marketing strategies. An efficient way to exploit the unstructured data of telephone calls is data-mining, but current techniques apply on text only. The CallSurf project gathers a number of academic and industrial partners covering the complete platform, from automatic transcription to information retrieval and data mining. This paper concentrates on the speech recognition module as it discusses the collection, the manual transcription of the training corpus and the techniques used to build the language model. The NLP techniques used to pre-process the transcribed corpus for data mining are POS tagging, lemmatization, noun group and named entity recognition. Some of them have been especially adapted to the conversational speech characteristics. POS tagging and preliminary data mining results obtained on the manually transcribed corpus are briefly discussed.
pdf
bib
abs
CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content
Martine Garnier-Rizet
|
Gilles Adda
|
Frederik Cailliau
|
Sylvie Guillemin-Lanne
|
Claire Waast-Richard
|
Lori Lamel
|
Stephan Vanni
|
Claire Waast-Richard
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Being the clients first interface, call centres worldwide contain a huge amount of information of all kind under the form of conversational speech. If accessible, this information can be used to detect eg. major events and organizational flaws, improve customer relations and marketing strategies. An efficient way to exploit the unstructured data of telephone calls is data-mining, but current techniques apply on text only. The CallSurf project gathers a number of academic and industrial partners covering the complete platform, from automatic transcription to information retrieval and data mining. This paper concentrates on the speech recognition module as it discusses the collection, the manual transcription of the training corpus and the techniques used to build the language model. The NLP techniques used to pre-process the transcribed corpus for data mining are POS tagging, lemmatization, noun group and named entity recognition. Some of them have been especially adapted to the conversational speech characteristics. POS tagging and preliminary data mining results obtained on the manually transcribed corpus are briefly discussed.