Juliette Thuilier
2023
“Chère maison” or “maison chère”? Transformer-based prediction of adjective placement in French
Eleni Metheniti
|
Tim Van de Cruys
|
Wissam Kerkri
|
Juliette Thuilier
|
Nabil Hathout
Findings of the Association for Computational Linguistics: EACL 2023
In French, the placement of the adjective within a noun phrase is subject to variation: it can appear either before or after the noun. We conduct experiments to assess whether transformer-based language models are able to learn the adjective position in noun phrases in French –a position which depends on several linguistic factors. Prior findings have shown that transformer models are insensitive to permutated word order, but in this work, we show that finetuned models are successful at learning and selecting the correct position of the adjective. However, this success can be attributed to the process of finetuning rather than the linguistic knowledge acquired during pretraining, as evidenced by the low accuracy of experiments of classification that make use of pretrained embeddings. Comparing the finetuned models to the choices of native speakers (with a questionnaire), we notice that the models favor context and global syntactic roles, and are weaker with complex structures and fixed expressions.
2012
Semantic annotation of French corpora: animacy and verb semantic classes
Juliette Thuilier
|
Laurence Danlos
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
This paper presents a first corpus of French annotated for animacy and for verb semantic classes. The resource consists of 1,346 sentences extracted from three different corpora: the French Treebank (Abeillé and Barrier, 2004), the Est-Républicain corpus (CNRTL) and the ESTER corpus (ELRA). It is a set of parsed sentences, containing a verbal head subcategorizing two complements, with annotations on the verb and on both complements, in the TIGER XML format (Mengel and Lezius, 2000). The resource was manually annotated and manually corrected by three annotators. Animacy has been annotated following the categories of Zaenen et al. (2004). Measures of inter-annotator agreement are good (Multi-pi = 0.82 and Multi-kappa = 0.86 (k = 3, N = 2360)). As for verb semantic classes, we used three of the five levels of classification of an existing dictionary: 'Les Verbes du Français' (Dubois and Dubois-Charlier, 1997). For the higher level (generic classes), the measures of agreement are Multi-pi = 0.84 and Multi-kappa = 0.87 (k = 3, N = 1346). The inter-annotator agreements show that the annotated data are reliable for both animacy and verbal semantic classes.
2010
Approche quantitative en syntaxe : l’exemple de l’alternance de position de l’adjectif épithète en français
Juliette Thuilier
|
Gwendoline Fox
|
Benoît Crabbé
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Cet article présente une analyse statistique sur des données de syntaxe qui a pour but d’aider à mieux cerner le phénomène d’alternance de position de l’adjectif épithète par rapport au nom en français. Nous montrons comment nous avons utilisé les corpus dont nous disposons (French Treebank et le corpus de l’Est-Républicain) ainsi que les ressources issues du traitement automatique des langues, pour mener à bien notre étude. La modélisation à partir de 13 variables relevant principalement des propriétés du syntagme adjectival, de celles de l’item adjectival, ainsi que de contraintes basées sur la fréquence, permet de prédire à plus de 93% la position de l’adjectif. Nous insistons sur l’importance de contraintes relevant de l’usage pour le choix de la position de l’adjectif, notamment à travers la fréquence d’occurrence de l’adjectif, et la fréquence de contextes dans lesquels il apparaît.
Search
Fix data
Co-authors
- Benoit Crabbé 1
- Laurence Danlos 1
- Gwendoline Fox 1
- Nabil Hathout 1
- Wissam Kerkri 1
- show all...