Semantic Frame Annotation on the French MEDIA corpus

Marie-Jean Meurs, Frédéric Duvert, Frédéric Béchet, Fabrice Lefèvre, Renato de Mori


Abstract
This paper introduces a knowledge representation formalism used for annotation of the French MEDIA dialogue corpus in terms of high level semantic structures. The semantic annotation, worked out according to the Berkeley FrameNet paradigm, is incremental and partially automated. We describe an automatic interpretation process for composing semantic structures from basic semantic constituents using patterns involving words and constituents. This process contains procedures which provide semantic compositions and generating frame hypotheses by inference. The MEDIA corpus is a French dialogue corpus recorded using a Wizard of Oz system simulating a telephone server for tourist information and hotel booking. It had been manually transcribed and annotated at the word and semantic constituent levels. These levels support the automatic interpretation process which provides a high level semantic frame annotation. The Frame based Knowledge Source we composed contains Frame definitions and composition rules. We finally provide some results obtained on the automatically-derived annotation.
Anthology ID:
L08-1491
Volume:
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Month:
May
Year:
2008
Address:
Marrakech, Morocco
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/256_paper.pdf
DOI:
Bibkey:
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2008/pdf/256_paper.pdf