Semantic annotation of French corpora: animacy and verb semantic classes

Juliette Thuilier, Laurence Danlos


Abstract
This paper presents a first corpus of French annotated for animacy and for verb semantic classes. The resource consists of 1,346 sentences extracted from three different corpora: the French Treebank (Abeillé and Barrier, 2004), the Est-Républicain corpus (CNRTL) and the ESTER corpus (ELRA). It is a set of parsed sentences, containing a verbal head subcategorizing two complements, with annotations on the verb and on both complements, in the TIGER XML format (Mengel and Lezius, 2000). The resource was manually annotated and manually corrected by three annotators. Animacy has been annotated following the categories of Zaenen et al. (2004). Measures of inter-annotator agreement are good (Multi-pi = 0.82 and Multi-kappa = 0.86 (k = 3, N = 2360)). As for verb semantic classes, we used three of the five levels of classification of an existing dictionary: 'Les Verbes du Français' (Dubois and Dubois-Charlier, 1997). For the higher level (generic classes), the measures of agreement are Multi-pi = 0.84 and Multi-kappa = 0.87 (k = 3, N = 1346). The inter-annotator agreements show that the annotated data are reliable for both animacy and verbal semantic classes.
Anthology ID:
L12-1312
Volume:
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Month:
May
Year:
2012
Address:
Istanbul, Turkey
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1533–1537
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/552_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Juliette Thuilier and Laurence Danlos. 2012. Semantic annotation of French corpora: animacy and verb semantic classes. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pages 1533–1537, Istanbul, Turkey. European Language Resources Association (ELRA).
Cite (Informal):
Semantic annotation of French corpora: animacy and verb semantic classes (Thuilier & Danlos, LREC 2012)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2012/pdf/552_Paper.pdf