Sylvain Kahane

2025

pdf bib
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)
Eva Hajičová | Sylvain Kahane
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)

pdf bib abs
A morpheme-based treebank for Gbaya, an Ubanguian language of Central Africa
Paulette Roulon-Doko | Sylvain Kahane | Bruno Guillaume
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)

In this paper, we present the first treebank for Gbaya, a language from the under-resourced Niger-Congo family. The language has a rich system of tonal morphemes and virtually no affixes. The dependency analysis is based on a morpheme-based tokenisation and the treebank is also distributed in word-based Universal Dependencies version. Several constructions are discussed in the paper: genitive construction, clause coordination, sentence particles, adverbial and relative clauses, serial verb constructions, reported speech, topicalization, and focalization.

pdf bib abs
A corpus-driven description of OV order in Archaic Chinese
Qishen Wu | Santiago Herrera | Pierre Magistry | Sylvain Kahane
Proceedings of the Eighth International Conference on Dependency Linguistics (Depling, SyntaxFest 2025)

This paper presents a quantitative study of Object‐Verb (OV) order in Archaic Chinese based on a Universal Dependencies (UD) treebanks. Treating word order as a binary choice (OV vs VO), we train a sparse logistic‐regression classifier that selects the most salient syntactic features needed for an accurate prediction to investigate the specific syntactic contexts allowing OV word order and to identify to what extent do these factors favour this order. The ranked features are understood as interpretable rules, and their coverage and precision as quantitative properties of each rule. The approach confirms earlier qualitative findings (e.g. pronoun object fronting and negation favour OV) and uncovers new contrasts in word order between different reflexive pronouns. It also identifies annotation errors that we corrected in the final analysis, illustrating how the quantitative models, combined with fine-grained corpus analysis, can improve treebank quality. Our study demonstrates that lightweight machine‐learning techniques applied to an existing syntactic resource can reveal fine‐grained patterns in historical word order and this can be reapplied to other languages.

pdf bib abs
Creating a multi-layer Treebank for Tundra Nenets
Nikolett Mus | Bruno Guillaume | Sylvain Kahane | Daniel Zeman
Proceedings of the 10th International Workshop on Computational Linguistics for Uralic Languages

This paper presents the development of the Tundra Nenets Universal Dependencies (UD) Treebank, the first syntactically annotated resource for the Samoyedic branch of the Uralic family. The treebank integrates spoken-language data and adopts the morphologically enhanced Surface-Syntactic UD (mSUD) framework to capture inflectional morphology and morphology-based syntactic relations. It further incorporates Information Structure annotation. The methodological workflow includes data selection, transcription conventions, sentence and lexeme segmentation, annotation of spoken-language features, lemmatization, treatment of morpheme status, part-of-speech and morphological tagging, and syntactic annotation based on the functional and distributional properties of syntactic elements. We also outline the principles guiding multi-level annotation and justify the theoretical choices underlying the integration of prosodic, morphological, and syntactic information.

pdf bib abs
Extraction of Contrastive Rules from Syntactic Treebanks: A Case Study in Romance Languages
Santiago Herrera | Ioana-Madalina Silai | Caio Corro | Bruno Guillaume | Sylvain Kahane
Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025)

In this paper, we develop a data-driven contrastive framework to extract common and distinctive linguistic descriptions from syntactic treebanks. The extracted contrastive rules are defined by a statistically significant difference in precision and classified as common and distinctive rules across the set of treebanks. We illustrate our method by working on object word order using Universal Dependencies (UD) treebanks in 6 Romance languages: Brazilian Portuguese, Catalan, French, Italian, Romanian and Spanish. We discuss the limitations faced due to inconsistent annotation and the feasibility of conducting contrasting studies using the UD collection.

pdf bib abs
An intonosyntactic treebank for spoken French: What is new with Rhapsodie?
Maria Paz Botero-Garcia | Emmett Strickland | Bruno Guillaume | Sylvain Kahane | Anne Lacheret-Dujour
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)

This paper presents a new format of the Rhapsodie Treebank, which contains both syntactic and prosodic annotations, offering a comprehensive dataset for the study of spoken French.This integrated format allow us for complex multilevel queries and open the way for the extraction of intonosyntactic studies.

pdf bib abs
Status of morphosyntactic features Illustration with written and spoken French UD treebanks
Sylvain Kahane | Bruno Guillaume | Léna Brun | Simeng Song
Proceedings of the 23rd International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2025)

Morphosyntactic features used in UD treebanks have different status. If most of them correspond to values of inflectional morphemes, some describe lexical subclasses or are just conventional names of polysemic morphemes. Syncretism is also a challenge, because exact values are only deductible from contextual information. We propose an attempt at clarification and an implementation in the treebanks of written and spoken French.

2024

pdf bib abs
Régression logistique parcimonieuse pour l’extraction automatique de règles de grammaire
Santiago Herrera | Caio Corro | Sylvain Kahane
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Nous proposons une nouvelle approche pour extraire et explorer des motifs grammaticaux à partir de corpus arborés, dans le but de construire des règles de grammaire syntaxique. Plus précisément, nous nous intéressons à deux phénomènes linguistiques, l’accord et l’ordre des mots, en utilisant un espace de recherche étendu et en accordant une attention particulière au classement des règles. Pour cela, nous utilisons un classifieur linéaire entraîné avec une pénalisation L1 pour identifier les caractéristiques les plus saillantes. Nous associons ensuite des informations quantitatives à chaque règle. Notre méthode permet de découvrir des règles de différentes granularités, certaines connues et d’autres moins. Dans ce travail, nous nous intéressons aux règles issues d’un corpus du français.

pdf bib abs
De nouvelles méthodes pour l’exploration de l’interface syntaxe-prosodie : un treebank intonosyntaxique et un système de synthèse pour le pidgin nigérian
Emmett Strickland | Anne Lacheret-Dujour | Marc Evrard | Sylvain Kahane | Dana Aubakirova | Dorin Doncenco | Diego Torres | Perrine Quennehen | Bruno Guillaume
Actes de la 31ème Conférence sur le Traitement Automatique des Langues Naturelles, volume 1 : articles longs et prises de position

Cet article présente deux ressources récemment développées pour explorer l’interface prosodie-syntaxe en pidgin nigérian, une langue à faibles ressources d’Afrique de l’Ouest. La première est un treebank intonosyntaxique dans laquelle chaque token est associé à une série de caractéristiques prosodiques au niveau de la syllabe, ce qui permet d’analyser diverses structures syntaxiques et prosodiques en utilisant une même interface. La seconde est un système de synthèse de la parole entraîné sur le même ensemble de données, conçu pour permettre un contrôle direct sur les contours intonatifs de la parole générée. Cet outil a été développé pour nous permettre de tester les hypothèses formulées à partir de l’exploration du treebank. Cet article est largement une adaptation de deux publications récentes présentant chaque outil, avec un accent sur leur interconnexion dans notre recherche en cours.

pdf bib abs
Joint Annotation of Morphology and Syntax in Dependency Treebanks
Bruno Guillaume | Kim Gerdes | Kirian Guiller | Sylvain Kahane | Yixuan Li
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we compare different ways to annotate both syntactic and morphological relations in a dependency treebank and we propose new formats we call mSUD and mUD, compatible with the Universal Dependencies (UD) schema for syntactic treebanks. We emphasize mSUD rather than mUD, the former being based on distributional criteria for the choice of the head of any combination, which allow us to clearly encode the internal structure of a word, that is, the derivational path. We investigate different problems posed by a morph-based annotation, concerning tokenization, choice of the head of a morph combination, relations between morphs, additional features needed, such as the token type differentiating roots and derivational and inflectional affixes. We show how our annotation schema can be applied to different languages from polysynthetic languages such as Yupik to isolating languages such as Chinese.

pdf bib abs
New Methods for Exploring Intonosyntax: Introducing an Intonosyntactic Treebank for Nigerian Pidgin
Emmett Strickland | Anne Lacheret-Dujour | Sylvain Kahane | Marc Evrard | Perrine Quennehen | Bernard Caron | Francis Egbokhare | Bruno Guillaume
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper presents a new phonetic resource for Nigerian Pidgin, a low-resource language of West Africa. Aiming to provide a new tool for research on intonosyntax, we have augmented an existing syntactic treebank of Nigerian Pidgin, associating each orthographically transcribed token with a series of syllable-level alignments and phonetizations. Syllables are further described using a set of continuous and discrete prosodic features. This new approach provides a simple tool for researchers to explore the prosodic characteristics of various syntactic phenomena. In this paper, we present the format of the corpus, the various features added, and several explorations that can be performed using an online interface. We also present a prosodically specified lexicon extracted using this resource. In it, each orthographic form is accompanied by the frequency of its phoneme-level variants, as well as the suprasegmental features that most frequently accompany each syllable. Finally, we present several additional case studies on how this corpus can used in the study of the language’s prosody.

pdf bib abs
New Proposal of Greenberg’s Universal 14 from Typometrics
Antoni Brosa-Rodríguez | Sylvain Kahane
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In his Universal 14, Greenberg stated that the normal and dominant order in all world languages was to place the condition before the conclusion in conditional sentences. We take this claim to review it quantitatively and based on occurrences in real texts in more than 50 languages. We can see that Greenberg’s proposal is correct but that it needs a reformulation to be true at all. We propose a quantitatively based and updated Universal 14, which gives a better account of the representation of the different languages analyzed and which is fulfilled in 100% of the cases (as opposed to Greenberg’s 60% in our sample). In addition, we also analyze adverbial sentences. Once we obtain the occurrence data in their direction (before or after the main verb), we plot a new Universal in a typometrical way: 100% of the languages show a higher proportion of preceding conditional clauses than of adverbial clauses, regardless of their type or the direction preference for adverbial clauses. The relationship between the SOV type and a stricter initial conditional location is also proposed.

pdf bib abs
Sparse Logistic Regression with High-order Features for Automatic Grammar Rule Extraction from Treebanks
Santiago Herrera | Caio Corro | Sylvain Kahane
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Descriptive grammars are highly valuable, but writing them is time-consuming and difficult. Furthermore, while linguists typically use corpora to create them, grammar descriptions often lack quantitative data. As for formal grammars, they can be challenging to interpret. In this paper, we propose a new method to extract and explore significant fine-grained grammar patterns and potential syntactic grammar rules from treebanks, in order to create an easy-to-understand corpus-based grammar. More specifically, we extract descriptions and rules across different languages for two linguistic phenomena, agreement and word order, using a large search space and paying special attention to the ranking order of the extracted rules. For that, we use a linear classifier to extract the most salient features that predict the linguistic phenomena under study. We associate statistical information to each rule, and we compare the ranking of the model’s results to those of other quantitative and statistical measures. Our method captures both well-known and less well-known significant grammar rules in Spanish, French, and Wolof.

2023

pdf bib abs
Word order flexibility: a typometric study
Sylvain Kahane | Ziqian Peng | Kim Gerdes
Proceedings of the Seventh International Conference on Dependency Linguistics (Depling, GURT/SyntaxFest 2023)

This paper introduces a typometric measure of flexibility, which quantifies the variability of head-dependent word order on the whole set of treebanks of a language or on specific constructions. The measure is based on the notion of head-initiality and we show that it can be computed for all of languages of the Universal Dependency treebank set, that it does not require ad-hoc thresholds to categorize languages or constructions, and that it can be applied with any granularity of constructions and languages. We compare our results with Bakker’s (1998) categorical flexibility index. Typometric flexibility is shown to be a good measure for characterizing the language distribution with respect to word order for a given construction, and for estimating whether a construction predicts the global word order behavior of a language.

pdf bib abs
Autogramm : développement simultané de treebanks et de grammaires à partir de corpus
Sylvain Kahane | Santiago Herrera | Bruno Guillaume | Kim Gerdes
Actes de CORIA-TALN 2023. Actes de la 30e Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 6 : projets

Ce projet de recherche vise à créer de nouveaux treebanks en dépendance pour des langues sous-dotées, en unifiant autant que possible leur développement avec celui de grammaires descriptives quantitatives. Nous présenterons notre chaîne de traitement et de développement de treebanks et nous discuterons du type de grammaire que nous voulons extraire. Enfin, nous examinerons l’utilisation de ces ressources en typologie quantitative.

This article considers the annotation of subjects in UD treebanks. The identification of the subject poses a particular problem in Wolof, due to pronominal indices whose status as a pronoun or a pronominal affix is uncertain. In the UD treebank available for Wolof (Dione, 2019), these have been annotated depending on the construction either as true subjects, or as morphosyntactic features agreeing with the verb. The study of this corpus of 40 000 words allows us to show that the problem is indeed difficult to solve, especially since Wolof has a rich system of auxiliaries and several basic constructions with different properties. Before addressing the case of Wolof, we will present the simpler, but partly comparable, case of French, where subject clitics also tend to behave like affixes, and subjecthood can move from the preverbal to the detached position. We will also make a several annotation recommendations that would avoid overwriting information regarding subjecthood.

2019

pdf bib
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)
Kim Gerdes | Sylvain Kahane
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf bib
Interpreting and defining connections in a dependency structure
Sylvain Kahane
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf bib
Exceptive constructions. A Dependency-based Analysis
Mohamed Galal | Sylvain Kahane | Yomna Safwat
Proceedings of the Fifth International Conference on Dependency Linguistics (Depling, SyntaxFest 2019)

pdf bib
A Surface-Syntactic UD Treebank for Naija
Bernard Caron | Marine Courtin | Kim Gerdes | Sylvain Kahane
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
Improving Surface-syntactic Universal Dependencies (SUD): MWEs and deep syntactic features
Kim Gerdes | Bruno Guillaume | Sylvain Kahane | Guy Perrier
Proceedings of the 18th International Workshop on Treebanks and Linguistic Theories (TLT, SyntaxFest 2019)

pdf bib
Advantages of the flux-based interpretation of dependency length minimization
Sylvain Kahane | Chunxiao Yan
Proceedings of the First Workshop on Quantitative Syntax (Quasy, SyntaxFest 2019)

pdf bib
Rediscovering Greenberg’s Word Order Universals in UD
Kim Gerdes | Sylvain Kahane | Xinying Chen
Proceedings of the Third Workshop on Universal Dependencies (UDW, SyntaxFest 2019)

2018

pdf bib
Une approche mathématique de la notion de structure syntaxique : raisonner en termes de connexions plutôt que d’unités [A mathematical approach of the notion of syntactic structure: reasoning in terms of connections rather than units]
Sylvain Kahane
Traitement Automatique des Langues, Volume 59, Numéro 1 : Varia [Varia]

pdf bib abs
SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD
Kim Gerdes | Bruno Guillaume | Sylvain Kahane | Guy Perrier
Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)

This article proposes a surface-syntactic annotation scheme called SUD that is near-isomorphic to the Universal Dependencies (UD) annotation scheme while following distributional criteria for defining the dependency tree structure and the naming of the syntactic functions. Rule-based graph transformation grammars allow for a bi-directional transformation of UD into SUD. The back-and-forth transformation can serve as an error-mining tool to assure the intra-language and inter-language coherence of the UD treebanks.

2017

pdf bib
What are the limitations on the flux of syntactic dependencies? Evidence from UD treebanks
Sylvain Kahane | Chunxiao Yan | Marie-Amélie Botalla
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
To What Extent is Immediate Constituency Analysis Dependency-Based? A Survey of Foundational Texts
Nicolas Mazziotta | Sylvain Kahane
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

pdf bib
Multi-word annotation in syntactic treebanks - Propositions for Universal Dependencies
Sylvain Kahane | Marine Courtin | Kim Gerdes
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

2016

pdf bib
Dependency Annotation Choices: Assessing Theoretical and Practical Issues of Universal Dependencies
Kim Gerdes | Sylvain Kahane
Proceedings of the 10th Linguistic Annotation Workshop held in conjunction with ACL 2016 (LAW-X 2016)

pdf bib abs
Encoding a syntactic dictionary into a super granular unification grammar
Sylvain Kahane | François Lareau
Proceedings of the Workshop on Grammar and Lexicon: interactions and interfaces (GramLex)

We show how to turn a large-scale syntactic dictionary into a dependency-based unification grammar where each piece of lexical information calls a separate rule, yielding a super granular grammar. Subcategorization, raising and control verbs, auxiliaries and copula, passivization, and tough-movement are discussed. We focus on the semantics-syntax interface and offer a new perspective on syntactic structure.

pdf bib
From built examples to attested examples: a syntax-based query for non-specialists
Ilaine Wang | Sylvain Kahane | Isabelle Tellier
Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Posters

2015

pdf bib
Les trois dimensions d’une modélisation formelle de la langue : syntagmatique, paradigmatique et sémiotique [The three dimensions of a formal modeling of natural language: syntagmatic, paradigmatic, and semiotic]
Sylvain Kahane
Traitement Automatique des Langues, Volume 56, Numéro 1 : Varia [Varia]

pdf bib
Non-constituent coordination and other coordinative constructions as Dependency Graphs
Kim Gerdes | Sylvain Kahane
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
Dependency-based analyses for function words – Introducing the polygraphic approach
Sylvain Kahane | Nicolas Mazziotta
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf bib
Syntactic Polygraphs. A Formalism Extending Both Constituency and Dependency
Sylvain Kahane | Nicolas Mazziotta
Proceedings of the 14th Meeting on the Mathematics of Language (MoL 2015)

2014

The main objective of the Rhapsodie project (ANR Rhapsodie 07 Corp-030-01) was to define rich, explicit, and reproducible schemes for the annotation of prosody and syntax in different genres (Â± spontaneous, Â± planned, face-to-face interviews vs. broadcast, etc.), in order to study the prosody/syntax/discourse interface in spoken French, and their roles in the segmentation of speech into discourse units (Lacheret, Kahane, & Pietrandrea forthcoming). We here describe the deliverable, a syntactic and prosodic treebank of spoken French, composed of 57 short samples of spoken French (5 minutes long on average, amounting to 3 hours of speech and 33000 words), orthographically and phonetically transcribed. The transcriptions and the annotations are all aligned on the speech signal: phonemes, syllables, words, speakers, overlaps. This resource is freely available at www.projet-rhapsodie.fr. The sound samples (wav/mp3), the acoustic analysis (original F0 curve manually corrected and automatic stylized F0, pitch format), the orthographic transcriptions (txt), the microsyntactic annotations (tabular format), the macrosyntactic annotations (txt, tabular format), the prosodic annotations (xml, textgrid, tabular format), and the metadata (xml and html) can be freely downloaded under the terms of the Creative Commons licence Attribution - Noncommercial - Share Alike 3.0 France. The metadata are encoded in the IMDI-CMFI format and can be parsed on line.

pdf bib abs
Correcting and Validating Syntactic Dependency in the Spoken French Treebank Rhapsodie
Rachel Bawden | Marie-Amélie Botalla | Kim Gerdes | Sylvain Kahane
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This article presents the methods, results, and precision of the syntactic annotation process of the Rhapsodie Treebank of spoken French. The Rhapsodie Treebank is an 33,000 word corpus annotated for prosody and syntax, licensed in its entirety under Creative Commons. The syntactic annotation contains two levels: a macro-syntactic level, containing a segmentation into illocutionary units (including discourse markers, parentheses â¦) and a micro-syntactic level including dependency relations and various paradigmatic structures, called pile constructions, the latter being particularly frequent and diverse in spoken language. The micro-syntactic annotation process, presented in this paper, includes a semi-automatic preparation of the transcription, the application of a syntactic dependency parser, transcoding of the parsing results to the Rhapsodie annotation scheme, manual correction by multiple annotators followed by a validation process, and finally the application of coherence rules that check common errors. The good inter-annotator agreement scores are presented and analyzed in greater detail. The article also includes the list of functions used in the dependency annotation and for the distinction of various pile constructions and presents the ideas underlying these choices.

pdf bib abs
Macrosyntactic Segmenters of a French Spoken Corpus
Ilaine Wang | Sylvain Kahane | Isabelle Tellier
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The aim of this paper is to describe an automated process to segment spoken French transcribed data into macrosyntactic units. While sentences are delimited by punctuation marks for written data, there is no obvious hint nor limit to major units for speech. As a reference, we used the manual annotation of macrosyntactic units based on illocutionary as well as syntactic criteria and developed for the Rhapsodie corpus, a 33.000 words prosodic and syntactic treebank. Our segmenters were built using machine learning methods as supervised classifiers : segmentation is about identifying the boundaries of units, which amounts to classifying each interword space. We trained six different models on Rhapsodie using different sets of features, including prosodic and morphosyntactic cues, on the assumption that their combination would be relevant for the task. Both types of cues could be resulting either from manual annotation/correction or from fully automated processes, which comparison might help determine the cost of manual effort, especially for the 3M words of spoken French of the Orfeo project those experiments are contributing to.

2013

pdf bib
Predicative Adjunction in a Modular Dependency Grammar
Sylvain Kahane
Proceedings of the Second International Conference on Dependency Linguistics (DepLing 2013)

2012

pdf bib
Intonosyntactic Data Structures: The Rhapsodie Treebank of Spoken French
Kim Gerdes | Sylvain Kahane | Anne Lacheret | Paola Pietandrea | Arthur Truong
Proceedings of the Sixth Linguistic Annotation Workshop

2011

pdf bib abs
Une modélisation des dites alternances de portée des quantifieurs par des opérations de combinaison des groupes nominaux (A model of called alternations of quantifiers scope by combination of nominal groups operations)
Sylvain Kahane
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Nous montrons que les différentes interprétations d’une combinaison de plusieurs GN peuvent être modélisées par deux opérations de combinaison sur les référents de ces GN, appelées combinaison cumulative et combinaison distributive. Nous étudions aussi bien les GN définis et indéfinis que les GN quantifiés ou pluriels et nous montrons comment la combinaison d’un GN avec d’autres éléments peut induire des interprétations collective ou individualisante. Selon la façon dont un GN se combine avec d’autres GN, le calcul de son référent peut être fonction de ces derniers ; ceci définit une relation d’ancrage de chaque GN, qui induit un ordre partiel sur les GN. Considérer cette relation plutôt que la relation converse de portée simplifie le calcul de l’interprétation des GN et des énoncés. Des représentations sémantiques graphiques et algébriques sans considération de la portée sont proposées pour les dites alternances de portée.

2010

pdf bib abs
Une approche paresseuse de l’analyse sémantique ou comment construire une interface syntaxe-sémantique à partir d’exemples
François-Régis Chaumartin | Sylvain Kahane
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Cet article montre comment calculer une interface syntaxe-sémantique à partir d’un analyseur en dépendance quelconque et interchangeable, de ressources lexicales variées et d’une base d’exemples associés à leur représentation sémantique. Chaque exemple permet de construire une règle d’interface. Nos représentations sémantiques sont des graphes hiérarchisés de relations prédicat-argument entre des acceptions lexicales et notre interface syntaxe-sémantique est une grammaire de correspondance polarisée. Nous montrons comment obtenir un système très modulaire en calculant certaines règles par « soustraction » de règles moins modulaires.

pdf bib
Depends on What the French Say - Spoken Corpus Annotation with and beyond Syntactic Functions
José Deulofeu | Lucie Duffort | Kim Gerdes | Sylvain Kahane | Paola Pietrandrea
Proceedings of the Fourth Linguistic Annotation Workshop

2007

pdf bib abs
Traduction, restructurations syntaxiques et grammaires de correspondance
Sylvain Kahane
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

Cet article présente une nouvelle formalisation du modèle de traduction par transfert de la Théorie Sens-Texte. Notre modélisation utilise les grammaires de correspondance polarisées et fait une stricte séparation entre les modèles monolingues, un lexique bilingue minimal et des règles de restructuration universelles, directement associées aux fonctions lexicales syntaxiques.

2006

pdf bib
Polarized Unification Grammars
Sylvain Kahane
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
A Polynomial Parsing Algorithm for the Topological Model: Synchronizing Constituent and Dependency Grammars, Illustrated by German Word Order Phenomena
Kim Gerdes | Sylvain Kahane
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib abs
Grammaire d’Unification Sens-Texte : modularité et polarisation
Sylvain Kahane | François Lareau
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

L’objectif de cet article est de présenter l’état actuel du modèle de la Grammaire d’Unification Sens-Texte, notamment depuis que les bases formelles du modèle ont été éclaircies grâce au développement des Grammaires d’Unification Polarisées. L’accent est mis sur l’architecture du modèle et le rôle de la polarisation dans l’articulation des différents modules — l’interface sémantique-syntaxe, l’interface syntaxe-morphotopologie et les grammaires décrivant les différents niveaux de représentation. Nous étudions comment les procédures d’analyse et de génération sont contrôlables par différentes stratégies de neutralisation des différentes polarités.

pdf bib abs
Structure des représentations logiques et interface sémantique-syntaxe
Sylvain Kahane
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article s’intéresse à la structure des représentations logiques des énoncés en langue naturelle. Par représentation logique, nous entendons une représentation sémantique incluant un traitement de la portée des quantificateurs. Nous montrerons qu’une telle représentation combine fondamentalement deux structures sous-jacentes, une structure « prédicative » et une structure hiérarchique logique, et que la distinction des deux permet, par exemple, un traitement élégant de la sous-spécification. Nous proposerons une grammaire polarisée pour manipuler directement la structure des représentations logiques (sans passer par un langage linéaire avec variables), ainsi qu’une grammaire pour l’interface sémantique-syntaxe.

2004

pdf bib abs
Grammaires d’unification polarisées
Sylvain Kahane
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article propose un formalisme mathématique générique pour la combinaison de structures. Le contrôle de la saturation des structures finales est réalisé par une polarisation des objets des structures élémentaires. Ce formalisme permet de mettre en évidence et de formaliser les mécanismes procéduraux masqués de nombreux formalismes, dont les grammaires de réécriture, les grammaires de dépendance, TAG, HPSG et LFG.

2003

pdf bib abs
Les signes grammaticaux dans l’interface sémantique-syntaxe d’une grammaire d’unification
Sylvain Kahane
Actes de la 10ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Cet article présente une grammaire d’unification dans laquelle les morphèmes grammaticaux sont traités similairement aux morphèmes lexicaux!: les deux types de morphèmes sont traités comme des signes à part entière et sont décris par des structures élémentaires qui peuvent s’unifier directement les unes aux autres (ce qui en fait une grammaire de dépendance). Nous illustrerons notre propos par un fragment de l’interface sémantique-syntaxe du français pour le verbe et l’adjectif!: voix, modes, temps, impersonnel et tough-movement.

2001

pdf bib abs
Grammaires de dpendance formelles et thorie Sens-Texte
Sylvain Kahane
Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles. Tutoriels

On appelle grammaire de dpendance toute grammaire formelle qui manipule comme reprsentations syntaxiques des structures de dpendance. Le but de ce cours est de prsenter la fois les grammaires de dpendance (formalismes et algorithmes de synthse et dÕanalyse) et la thorie Sens-Texte, une thorie linguistique riche et pourtant mconnue, dans laquelle la dpendance joue un rle crucial et qui sert de base thorique plusieurs grammaires de dpendance.

pdf bib
Word Order in German: A Formal Dependency Grammar Using a Topological Hierarchy
Kim Gerdes | Sylvain Kahane
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics