Nabil Hathout - ACL Anthology

Nabil Hathout

2025

Embeddings, topic models, LLM : un air de famille
Ludovic Tanguy | Cécile Fabre | Nabil Hathout | Lydia-Mai Ho-Dac
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

Word embeddings, topic models, LLMs: a family affair This article presents a study on terms denoting family relationships (brother, aunt, etc.) in French using three approaches: word embeddings, topic modeling, and pre-trained language models. The first two types of representations are built from the French version of Wikipedia, while the third is derived through direct interaction with ChatGPT. The aim is to compare how these three methods represent such terms, in two main ways: by evaluating them against a structural definition of family relations (in terms of features such as gender, lineage, etc.), and by comparing the topics associated with each term. These methods reveal different modes of structuring family-related vocabulary, while also underscoring the continued necessity of corpus-based and controlled analyses to obtain reliable results.

2023

“Chère maison” or “maison chère”? Transformer-based prediction of adjective placement in French
Eleni Metheniti | Tim Van de Cruys | Wissam Kerkri | Juliette Thuilier | Nabil Hathout
Findings of the Association for Computational Linguistics: EACL 2023

In French, the placement of the adjective within a noun phrase is subject to variation: it can appear either before or after the noun. We conduct experiments to assess whether transformer-based language models are able to learn the adjective position in noun phrases in French –a position which depends on several linguistic factors. Prior findings have shown that transformer models are insensitive to permutated word order, but in this work, we show that finetuned models are successful at learning and selecting the correct position of the adjective. However, this success can be attributed to the process of finetuning rather than the linguistic knowledge acquired during pretraining, as evidenced by the low accuracy of experiments of classification that make use of pretrained embeddings. Comparing the finetuned models to the choices of native speakers (with a questionnaire), we notice that the models favor context and global syntactic roles, and are weaker with complex structures and fixed expressions.

2022

About Time: Do Transformers Learn Temporal Verbal Aspect?
Eleni Metheniti | Tim Van De Cruys | Nabil Hathout
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

Aspect is a linguistic concept that describes how an action, event, or state of a verb phrase is situated in time. In this paper, we explore whether different transformer models are capable of identifying aspectual features. We focus on two specific aspectual features: telicity and duration. Telicity marks whether the verb’s action or state has an endpoint or not (telic/atelic), and duration denotes whether a verb expresses an action (dynamic) or a state (stative). These features are integral to the interpretation of natural language, but also hard to annotate and identify with NLP methods. We perform experiments in English and French, and our results show that transformer models adequately capture information on telicity and duration in their vectors, even in their non-finetuned forms, but are somewhat biased with regard to verb tense and word order.

Organizing and Improving a Database of French Word Formation Using Formal Concept Analysis
Nyoman Juniarta | Olivier Bonami | Nabil Hathout | Fiammetta Namer | Yannick Toussaint
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We apply Formal Concept Analysis (FCA) to organize and to improve the quality of Démonette2, a French derivational database, through a detection of both missing and spurious derivations in the database. We represent each derivational family as a graph. Given that the subgraph relation exists among derivational families, FCA can group families and represent them in a partially ordered set (poset). This poset is also useful for improving the database. A family is regarded as a possible anomaly (meaning that it may have missing and/or spurious derivations) if its derivational graph is almost, but not completely identical to a large number of other families.

2021

Caractérisation des relations sémantiques entre termes multi-mots fondée sur l’analogie (Semantic relations recognition between multi-word terms by means of analogy )
Yizhe Wang | Béatrice Daille | Nabil Hathout
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

La terminologie d’un domaine rend compte de la structure du domaine grâce aux relations entre ses termes. Dans cet article, nous nous intéressons à la caractérisation des relations terminologiques qui existent entre termes multi-mots (MWT) dans les espaces vectoriels distributionnels. Nous avons constitué un jeu de données composé de MWT en français du domaine de l’environnement, reliés par des relations sémantiques lexicales. Nous présentons une expérience dans laquelle ces relations sémantiques entre MWT sont caractérisées au moyen de l’analogie. Les résultats obtenus permettent d’envisager un processus automatique pour aider à la structuration des terminologies.

Prédire l’aspect linguistique en anglais au moyen de transformers (Classifying Linguistic Aspect in English with Transformers )
Eleni Metheniti | Tim van de Cruys | Nabil Hathout
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

L’aspect du verbe décrit la manière dont une action, un événement ou un état exprimé par un verbe est lié au temps ; la télicité est la propriété d’un syntagme verbal qui présente une action ou un événement comme étant mené à son terme ; la durée distingue les verbes qui expriment une action (dynamique) ou un état (statique). Ces caractéristiques essentielles à l’interprétation du langage naturel, sont également difficiles à annoter et à identifier par les méthodes de TAL. Dans ce travail, nous estimons la capacité de différents modèles de type transformers pré-entraînés (BERT, RoBERTa, XLNet, ALBERT) à prédire la télicité et la durée. Nos résultats montrent que BERT est le plus performant sur les deux tâches, tandis que les modèles XLNet et ALBERT sont les plus faibles. Par ailleurs, les performances de la plupart des modèles sont améliorées lorsqu’on leur fournit en plus la position des verbes. Globalement, notre étude établit que les modèles de type transformers captent en grande partie la télicité et la durée.

Not Quite There Yet: Combining Analogical Patterns and Encoder-Decoder Networks for Cognitively Plausible Inflection
Basilio Calderone | Nabil Hathout | Olivier Bonami
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

The paper presents four models submitted to Part 2 of the SIGMORPHON 2021 Shared Task 0, which aims at replicating human judgements on the inflection of nonce lexemes. Our goal is to explore the usefulness of combining pre-compiled analogical patterns with an encoder-decoder architecture. Two models are designed using such patterns either in the input or the output of the network. Two extra models controlled for the role of raw similarity of nonce inflected forms to existing inflected forms in the same paradigm cell, and the role of the type frequency of analogical patterns. Our strategy is entirely endogenous in the sense that the models appealing solely to the data provided by the SIGMORPHON organisers, without using external resources. Our model 2 ranks second among all submitted systems, suggesting that the inclusion of analogical patterns in the network architecture is useful in mimicking speakers’ predictions.

2020

How Relevant Are Selectional Preferences for Transformer-based Language Models?
Eleni Metheniti | Tim Van de Cruys | Nabil Hathout
Proceedings of the 28th International Conference on Computational Linguistics

Selectional preference is defined as the tendency of a predicate to favor particular arguments within a certain linguistic context, and likewise, reject others that result in conflicting or implausible meanings. The stellar success of contextual word embedding models such as BERT in NLP tasks has led many to question whether these models have learned linguistic information, but up till now, most research has focused on syntactic information. We investigate whether Bert contains information on the selectional preferences of words, by examining the probability it assigns to the dependent word given the presence of a head word in a sentence. We are using word pairs of head-dependent words in five different syntactic relations from the SP-10K corpus of selectional preference (Zhang et al., 2019b), in sentences from the ukWaC corpus, and we are calculating the correlation of the plausibility score (from SP-10K) and the model probabilities. Our results show that overall, there is no strong positive or negative correlation in any syntactic relation, but we do find that certain head words have a strong correlation and that masking all words but the head word yields the most positive correlations in most scenarios –which indicates that the semantics of the predicate is indeed an integral and influential factor for the selection of the argument.

A study of semantic projection from single word terms to multi-word terms in the environment domain
Yizhe Wang | Beatrice Daille | Nabil Hathout
Proceedings of the 6th International Workshop on Computational Terminology

The semantic projection method is often used in terminology structuring to infer semantic relations between terms. Semantic projection relies upon the assumption of semantic compositionality: the relation that links simple term pairs remains valid in pairs of complex terms built from these simple terms. This paper proposes to investigate whether this assumption commonly adopted in natural language processing is actually valid. First, we describe the process of constructing a list of semantically linked multi-word terms (MWTs) related to the environmental field through the extraction of semantic variants. Second, we present our analysis of the results from the semantic projection. We find that contexts play an essential role in defining the relations between MWTs.

Représentation sémantique des familles dérivationnelles au moyen de frames morphosémantiques (Semantic representation of derivational families by means of morphosemantic frames )
Daniele Sanacore | Nabil Hathout | Fiammetta Namer
Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles

L’article présente un formalisme de représentation des relations morphologiques dérivationnelles inspiré de la Sémantique des Frames. La description morphosémantique y est réalisée au niveau des familles dérivationnelles au moyen de frames morphosémantiques dans lesquels les lexèmes sont définis les uns relativement aux autres. Les frames morphosémantiques permettent de rendre compte de la structure paradigmatique du lexique morphologique par l’alignement des familles qui présentent les mêmes oppositions de sens. La seconde partie de l’article est consacrée aux données qui seront utilisées pour produire (semi-) automatiquement ces représentations.

ENGLAWI: From Human- to Machine-Readable Wiktionary
Franck Sajous | Basilio Calderone | Nabil Hathout
Proceedings of the Twelfth Language Resources and Evaluation Conference

This paper introduces ENGLAWI, a large, versatile, XML-encoded machine-readable dictionary extracted from Wiktionary. ENGLAWI contains 752,769 articles encoding the full body of information included in Wiktionary: simple words, compounds and multiword expressions, lemmas and inflectional paradigms, etymologies, phonemic transcriptions in IPA, definition glosses and usage examples, translations, semantic and morphological relations, spelling variants, etc. It is fully documented, released under a free license and supplied with G-PeTo, a series of scripts allowing easy information extraction from ENGLAWI. Additional resources extracted from ENGLAWI, such as an inflectional lexicon, a lexicon of diatopic variants and the inclusion dates of headwords in Wiktionary’s nomenclature are also provided. The paper describes the content of the resource and illustrates how it can be - and has been - used in previous studies. We finally introduce an ongoing work that computes lexicographic word embeddings from ENGLAWI’s definitions.

Glawinette: a Linguistically Motivated Derivational Description of French Acquired from GLAWI
Nabil Hathout | Franck Sajous | Basilio Calderone | Fiammetta Namer
Proceedings of the Twelfth Language Resources and Evaluation Conference

Glawinette is a derivational lexicon of French that will be used to feed the Démonette database. It has been created from the GLAWI machine readable dictionary. We collected couples of words from the definitions and the morphological sections of the dictionary and then selected the ones that form regular formal analogies and that instantiate frequent enough formal patterns. The graph structure of the morphological families has then been used to identify for each couple of lexemes derivational patterns that are close to the intuition of the morphologists.

2019

Demonette2 - Une base de données dérivationnelle du français à grande échelle : premiers résultats (Demonette2 – A large scale derivational database for French: first results)
Fiammetta Namer | Lucie Barque | Olivier Bonami | Pauline Haas | Nabil Hathout | Delphine Tribout
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Volume II : Articles courts

Cet article présente la conception et le développement de Demonette2, une base de données dérivationnelle à grande échelle du français, développée dans le cadre du projet ANR Démonext (ANR-17-CE23-0005). L’article décrit les objectifs du projet, la structure de la base et expose les premiers résultats du projet, en mettant l’accent sur un enjeu crucial : la question du codage sémantique des entrées et des relations.

Toward a Computational Multidimensional Lexical Similarity Measure for Modeling Word Association Tasks in Psycholinguistics
Bruno Gaume | Lydia Mai Ho-Dac | Ludovic Tanguy | Cécile Fabre | Bénédicte Pierrejean | Nabil Hathout | Jérôme Farinas | Julien Pinquier | Lola Danet | Patrice Péran | Xavier De Boissezon | Mélanie Jucla
Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

This paper presents the first results of a multidisciplinary project, the “Evolex” project, gathering researchers in Psycholinguistics, Neuropsychology, Computer Science, Natural Language Processing and Linguistics. The Evolex project aims at proposing a new data-based inductive method for automatically characterising the relation between pairs of french words collected in psycholinguistics experiments on lexical access. This method takes advantage of several complementary computational measures of semantic similarity. We show that some measures are more correlated than others with the frequency of lexical associations, and that they also differ in the way they capture different semantic relations. This allows us to consider building a multidimensional lexical similarity to automate the classification of lexical associations.

ParaDis and Démonette: From Theory to Resources for Derivational Paradigms
Fiammetta Namer | Nabil Hathout
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology

Semantic descriptions of French derivational relations in a families-and-paradigms framework
Daniele Sanacore | Nabil Hathout | Fiammetta Namer
Proceedings of the Second International Workshop on Resources and Tools for Derivational Morphology

2018

Corpora with Part-of-Speech Annotations for Three Regional Languages of France: Alsatian, Occitan and Picard
Delphine Bernhard | Anne-Laure Ligozat | Fanny Martin | Myriam Bras | Pierre Magistry | Marianne Vergez-Couret | Lucie Steiblé | Pascale Erhart | Nabil Hathout | Dominique Huck | Christophe Rey | Philippe Reynés | Sophie Rosset | Jean Sibille | Thomas Lavergne
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

Giving Lexical Resources a Second Life: Démonette, a Multi-sourced Morpho-semantic Network for French
Nabil Hathout | Fiammetta Namer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Démonette is a derivational morphological network designed for the description of French. Its original architecture enables its use as a formal framework for the description of morphological analyses and as a repository for existing lexicons. It is fed with a variety of resources, which all are already validated. The harmonization of their content into a unified format provides them a second life, in which they are enriched with new properties, provided these are deductible from their contents. Démonette is released under a Creative Commons license. It is usable for theoretical and descriptive research in morphology, as a source of experimental material for psycholinguistics, natural language processing (NLP) and information retrieval (IR), where it fills a gap, since French lacks a large-coverage derivational resources database. The article presents the integration of two existing lexicons into Démonette. The first is Verbaction, a lexicon of deverbal action nouns. The second is Lexeur, a database of agent nouns in -eur derived from verbs or from nouns.

Wiktionnaire’s Wikicode GLAWIfied: a Workable French Machine-Readable Dictionary
Nabil Hathout | Franck Sajous
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

GLAWI is a free, large-scale and versatile Machine-Readable Dictionary (MRD) that has been extracted from the French language edition of Wiktionary, called Wiktionnaire. In (Sajous and Hathout, 2015), we introduced GLAWI, gave the rationale behind the creation of this lexicographic resource and described the extraction process, focusing on the conversion and standardization of the heterogeneous data provided by this collaborative dictionary. In the current article, we describe the content of GLAWI and illustrate how it is structured. We also suggest various applications, ranging from linguistic studies, NLP applications to psycholinguistic experimentation. They all can take advantage of the diversity of the lexical knowledge available in GLAWI. Besides this diversity and extensive lexical coverage, GLAWI is also remarkable because it is the only free lexical resource of contemporary French that contains definitions. This unique material opens way to the renewal of MRD-based methods, notably the automated extraction and acquisition of semantic relations.

2015

Évaluation sur mesure de modèles distributionnels sur un corpus spécialisé : comparaison des approches par contextes syntaxiques et par fenêtres graphiques [Tailor-made evaluation of distributional models on a specialized corpus: comparison of syntactic context and graphical window approaches]
Ludovic Tanguy | Franck Sajous | Nabil Hathout
Traitement Automatique des Langues, Volume 56, Numéro 2 : Sémantique distributionnelle [Distributional semantics]

2014

Démonette, a French derivational morpho-semantic network
Nabil Hathout | Fiammetta Namer
Linguistic Issues in Language Technology, Volume 11, 2014 - Theoretical and Computational Morphology: New Trends and Synergies

Démonette is a derivational morphological network created from information provided by two existing lexical resources, DériF and Morphonette. It features a formal architecture in which words are associated with semantic types and where morphological relations, labelled with concrete and abstract bi-oriented definitions, connect derived words with their base and indirectly related words with each other.

The Démonette Lexical Database: between Constructional Semantics and Word Formation (La base lexicale Démonette : entre sémantique constructionnelle et morphologie dérivationnelle) [in French]
Nabil Hathout | Fiammetta Namer
Proceedings of TALN 2014 (Volume 1: Long Papers)

GLÀFF, a Large Versatile French Lexicon
Nabil Hathout | Franck Sajous | Basilio Calderone
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces GLAFF, a large-scale versatile French lexicon extracted from Wiktionary, the collaborative online dictionary. GLAFF contains, for each entry, inflectional features and phonemic transcriptions. It distinguishes itself from the other available French lexicons by its size, its potential for constant updating and its copylefted license. We explain how we have built GLAFF and compare it to other known resources in terms of coverage and quality of the phonemic transcriptions. We show that its size and quality are strong assets that could allow GLAFF to become a reference lexicon for French NLP and linguistics. Moreover, other derived lexicons can easily be based on GLAFF to satisfy specific needs of various fields such as psycholinguistics.

Acquisition and enrichment of morphological and morphosemantic knowledge from the French Wiktionary
Nabil Hathout | Franck Sajous | Basilio Calderone
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing

TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)
Cécile Fabre | Nabil Hathout | Lydia-Mai Ho-Dac | François Morlane-Hondère | Philippe Muller | Franck Sajous | Ludovic Tanguy | Tim Van de Cruys
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

Presentation of the SemDis 2014 workshop: distributional semantics for two tasks - lexical substitution and exploration of specialized corpora (Présentation de l’atelier SemDis 2014 : sémantique distributionnelle pour la substitution lexicale et l’exploration de corpus spécialisés) [in French]
Cécile Fabre | Nabil Hathout | Lydia-Mai Ho-Dac | François Morlane-Hondère | Philippe Muller | Franck Sajous | Ludovic Tanguy | Tim Van de Cruys
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

Tuning distributional analysis for a small specialized corpus (Ajuster l’analyse distributionnelle à un corpus spécialisé de petite taille) [in French]
Cécile Fabre | Nabil Hathout | Franck Sajous | Ludovic Tanguy
TALN-RECITAL 2014 Workshop SemDis 2014 : Enjeux actuels de la sémantique distributionnelle (SemDis 2014: Current Challenges in Distributional Semantics)

2013

GLÀFF, a Large Versatile French Lexicon (GLÀFF, un Gros Lexique À tout Faire du Français) [in French]
Franck Sajous | Nabil Hathout | Basilio Calderone
Proceedings of TALN 2013 (Volume 1: Long Papers)

2011

Création de clusters sémantiques dans des familles morphologiques à partir du TLFi (Creating semantic clusters in morphological families from the TLFi)
Nuria Gala | Nabil Hathout | Alexis Nasr | Véronique Rey | Selja Seppälä
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

La constitution de ressources linguistiques est une tâche longue et coûteuse. C’est notamment le cas pour les ressources morphologiques. Ces ressources décrivent de façon approfondie et explicite l’organisation morphologique du lexique complétée d’informations sémantiques exploitables dans le domaine du TAL. Le travail que nous présentons dans cet article s’inscrit dans cette perspective et, plus particulièrement, dans l’optique d’affiner une ressource existante en s’appuyant sur des informations sémantiques obtenues automatiquement. Notre objectif est de caractériser sémantiquement des familles morpho-phonologiques (des mots partageant une même racine et une continuité de sens). Pour ce faire, nous avons utilisé des informations extraites du TLFi annoté morpho-syntaxiquement. Les premiers résultats de ce travail seront analysés et discutés.

Règles et paradigmes en morphologie informatique lexématique (Rules and paradigms in lexematic computer morphology)
Nabil Hathout | Fiammetta Namer
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Les familles de mots produites par deux analyseurs morphologiques, DériF (basé sur des règles) et Morphonette (basé sur l’analogie), appliqués à un même corpus lexical, sont comparées. Cette comparaison conduit à l’examen de trois sous-ensembles : - un sous-ensemble commun aux deux systèmes dont la taille montre que, malgré leurs différences, les approches expérimentées par chaque système sont valides et décrivent en partie la même réalité morphologique. - un sous-ensemble propre à DériF et un autre à Morphonette. Ces ensembles (a) nous renseignent sur les caractéristiques propres à chaque système, et notamment sur ce que l’autre ne peut pas produire, (b) ils mettent en évidence les erreurs d’un système, en ce qu’elles n’apparaissent pas dans l’autre, (c) ils font apparaître certaines limites de la description, notamment celles qui sont liées aux objets et aux notions théoriques comme les familles morphologiques, les bases, l’existence de RCL « transversales » entre les lexèmes qui n’ont pas de relation d’ascendance ou de descendance.

Traitement Automatique des Langues, Volume 52, Numéro 2 : Vers la morphologie et au-delà [Toward Morphology and beyond]
Nabil Hathout | Fiammetta Namer
Traitement Automatique des Langues, Volume 52, Numéro 2 : Vers la morphologie et au-delà [Toward Morphology and beyond]

Préface [Foreword]
Nabil Hathout | Fiammetta Namer
Traitement Automatique des Langues, Volume 52, Numéro 2 : Vers la morphologie et au-delà [Toward Morphology and beyond]

2009

Acquisition morphologique à partir d’un dictionnaire informatisé
Nabil Hathout
Actes de la 16ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’article propose un modèle linguistique et informatique permettant de faire émerger la structure morphologique dérivationnelle du lexique à partir des régularités sémantiques et formelles des mots qu’il contient. Ce modèle est radicalement lexématique. La structure morphologique est constituée par les relations que chaque mot entretient avec les autres unités du lexique et notamment avec les mots de sa famille morphologique et de sa série dérivationnelle. Ces relations forment des paradigmes analogiques. La modélisation a été testée sur le lexique du français en utilisant le dictionnaire informatisé TLFi.

2008

Acquistion of the Morphological Structure of the Lexicon Based on Lexical Similarity and Formal Analogy
Nabil Hathout
Coling 2008: Proceedings of the 3rd Textgraphs workshop on Graph-based Algorithms for Natural Language Processing

2007

Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Nabil Hathout | Philippe Muller
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters
Nabil Hathout | Philippe Muller
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations
Nabil Hathout | Philippe Muller
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Démonstrations

2006

Productivité quantitative des suffixations par -ité et -Able dans un corpus journalistique moderne
Natalia Grabar | Delphine Tribout | Georgette Dal | Bernard Fradin | Nabil Hathout | Stéphanie Lignon | Fiammetta Namer | Clément Plancq | François Yvon | Pierre Zweigenbaum
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans ce travail, nous étudions en corpus la productivité quantitative des suffixations par -Able et par -ité du français, d’abord indépendamment l’une de l’autre, puis lorsqu’elles s’enchaînent dérivationnellement (la suffixation en -ité s’applique à des bases en -Able dans environ 15 % des cas). Nous estimons la productivité de ces suffixations au moyen de mesures statistiques dont nous suivons l’évolution par rapport à la taille du corpus. Ces deux suffixations sont productives en français moderne : elles forment de nouveaux lexèmes tout au long des corpus étudiés sans qu’on n’observe de saturation, leurs indices de productivité montrent une évolution stable bien qu’étant dépendante des calculs qui leur sont appliqués. On note cependant que, de façon générale, de ces deux suffixations, c’est la suffixation par -ité qui est la plus fréquente en corpus journalistique, sauf précisément quand -ité s’applique à un adjectif en -Able. Étant entendu qu’un adjectif en -Able et le nom en -ité correspondant expriment la même propriété, ce résultat indique que la complexité de la base est un paramètre à prendre en considération dans la formation du lexique possible.

Synonym Extraction Using a Semantic Distance on a Dictionary
Philippe Muller | Nabil Hathout | Bruno Gaume
Proceedings of TextGraphs: the First Workshop on Graph Based Methods for Natural Language Processing

2004

Désambiguïsation par proximité structurelle
Bruno Gaume | Nabil Hathout | Philippe Muller
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’article présente une méthode de désambiguïsation dans laquelle le sens est déterminé en utilisant un dictionnaire. La méthode est basée sur un algorithme qui calcule une distance « sémantique » entre les mots du dictionnaire en prenant en compte la topologie complète du dictionnaire, vu comme un graphe sur ses entrées. Nous l’avons testée sur la désambiguïsation des définitions du dictionnaire elles-mêmes. L’article présente des résultats préliminaires, qui sont très encourageants pour une méthode ne nécessitant pas de corpus annoté.

Word Sense Disambiguation using a dictionary for sense similarity measure
Bruno Gaume | Nabil Hathout | Philippe Muller
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2002

Webaffix : un outil d’acquisition morphologique dérivationnelle à partir du Web
Ludovic Tanguy | Nabil Hathout
Actes de la 9ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’article présente Webaffix, un outil d’acquisition de couples de lexèmes morphologiquement apparentés à partir du Web. La méthode utilisé est inductive et indépendante des langues particulières. Webaffix (1) utilise un moteur de recherche pour collecter des formes candidates qui contiennent un suffixe graphémique donné, (2) prédit les bases potentielles de ces candidats et (3) recherche sur le Web des cooccurrences des candidats et de leurs bases prédites. L’outil a été utilisé pour enrichir Verbaction, un lexique de liens entre verbes et noms d’action ou d’événement correspondants. L’article inclut une évaluation des liens morphologiques acquis.

Webaffix: Discovering Morphological Links on the WWW
Nabil Hathout | Ludovic Tanguy
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

From WordNet to CELEX: acquiring morphological links from dictionaries of synonyms
Nabil Hathout
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

Analogies morpho-synonymiques. Une méthode d’acquisition automatique de liens morphologiques à partir d’un dictionnaire de synonymes
Nabil Hathout
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente une méthode de construction automatique de liens morphologiques à partir d’un dictionnaire de synonymes. Une analyse de ces liens met en lumière certains aspects de la structure morphologique du lexique dont on peut tirer partie pour identifier les variations allomorphiques des suffixations extraites.

1996

Computational Semantics of Time/Negation Interaction
Pascal Amsili | Nabil Hathout
COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics

Co-authors

Tim Van de Cruys 6

Cécile Fabre 5

Lydia-Mai Ho-Dac 4

Eleni Metheniti 4

Olivier Bonami 3

Béatrice Daille 2

François Morlane-Hondère 2

Daniele Sanacore 2

Delphine Tribout 2

Pascal Amsili 1

Delphine Bernhard 1

Georgette Dal 1

Xavier De Boissezon 1

Pascale Erhart 1

Jérôme Farinas 1

Bernard Fradin 1

Natalia Grabar 1

Dominique Huck 1

Mélanie Jucla 1

Nyoman Juniarta 1

Wissam Kerkri 1

Thomas Lavergne 1

Stéphanie Lignon 1

Anne-Laure Ligozat 1

Pierre Magistry 1

Bénédicte Pierrejean 1

Julien Pinquier 1

Clément Plancq 1

Patrice Péran 1

Véronique Rey 1

Christophe Rey 1

Philippe Reynés 1

Sophie Rosset 1

Selja Seppälä 1

Lucie Steiblé 1

Juliette Thuilier 1

Yannick Toussaint 1

Marianne Vergez-Couret 1

François Yvon 1

Pierre Zweigenbaum 1

Venues