Eric Laporte

Also published as: Éric Laporte


Building Korean Linguistic Resource for NLU Data Generation of Banking App CS Dialog System
Jeongwoo Yoon | Onyu Park | Changhoe Hwang | Gwanghoon Yoo | Eric Laporte | Jeesun Nam
Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning

Natural language understanding (NLU) is integral to task-oriented dialog systems, but demands a considerable amount of annotated training data to increase the coverage of diverse utterances. In this study, we report the construction of a linguistic resource named FIAD (Financial Annotated Dataset) and its use to generate a Korean annotated training data for NLU in the banking customer service (CS) domain. By an empirical examination of a corpus of banking app reviews, we identified three linguistic patterns occurring in Korean request utterances: TOPIC (ENTITY, FEATURE), EVENT, and DISCOURSE MARKER. We represented them in LGGs (Local Grammar Graphs) to generate annotated data covering diverse intents and entities. To assess the practicality of the resource, we evaluate the performances of DIET-only (Intent: 0.91 /Topic [entity+feature]: 0.83), DIET+ HANBERT (I:0.94/T:0.85), DIET+ KoBERT (I:0.94/T:0.86), and DIET+ KorBERT (I:0.95/T:0.84) models trained on FIAD-generated data to extract various types of semantic items.

SSP-Based Construction of Evaluation-Annotated Data for Fine-Grained Aspect-Based Sentiment Analysis
Suwon Choi | Shinwoo Kim | Changhoe Hwang | Gwanghoon Yoo | Eric Laporte | Jeesun Nam
Proceedings of the First Workshop on Pattern-based Approaches to NLP in the Age of Deep Learning

We report the construction of a Korean evaluation-annotated corpus, hereafter called ‘Evaluation Annotated Dataset (EVAD)’, and its use in Aspect-Based Sentiment Analysis (ABSA) extended in order to cover e-commerce reviews containing sentiment and non-sentiment linguistic patterns. The annotation process uses Semi-Automatic Symbolic Propagation (SSP). We built extensive linguistic resources formalized as a Finite-State Transducer (FST) to annotate corpora with detailed ABSA components in the fashion e-commerce domain. The ABSA approach is extended, in order to analyze user opinions more accurately and extract more detailed features of targets, by including aspect values in addition to topics and aspects, and by classifying aspect-value pairs depending whether values are unary, binary, or multiple. For evaluation, the KoBERT and KcBERT models are trained on the annotated dataset, showing robust performances of F1 0.88 and F1 0.90, respectively, on recognition of aspect-value pairs.


Where Do Aspectual Variants of Light Verb Constructions Belong?
Aggeliki Fotopoulou | Eric Laporte | Takuya Nakamura
Proceedings of the 17th Workshop on Multiword Expressions (MWE 2021)

Expressions with an aspectual variant of a light verb, e.g. ‘take on debt’ vs. ‘have debt’, are frequent in texts but often difficult to classify between verbal idioms, light verb constructions or compositional phrases. We investigate the properties of such expressions with a disputed membership and propose a selection of features that determine more satisfactory boundaries between the three categories in this zone, assigning the expressions to one of them.


Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
Jorge Baptista | Pushpak Bhattacharyya | Christiane Fellbaum | Mikel Forcada | Chu-Ren Huang | Svetla Koeva | Cvetana Krstev | Eric Laporte
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing


A new semantically annotated corpus with syntactic-semantic and cross-lingual senses
Myriam Rakho | Éric Laporte | Matthieu Constant
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this article, we describe a new sense-tagged corpus for Word Sense Disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in the english version of this instance in a parallel corpus, (2) an entry of the verb in a computational dictionary of French (the Lexicon-Grammar tables) and (3) a fine-grained sense label resulting from the concatenation of the translation and the Lexicon-Grammar entry.


Integration of Data from a Syntactic Lexicon into Generative and Discriminative Probabilistic Parsers
Anthony Sigogne | Matthieu Constant | Éric Laporte
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

French parsing enhanced with a word clustering method based on a syntactic lexicon
Anthony Sigogne | Matthieu Constant | Éric Laporte
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages


Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications
Éric Laporte | Preslav Nakov | Carlos Ramisch | Aline Villavicencio
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications


Outilex, plate-forme logicielle de traitement de textes écrits
Olivier Blanc | Matthieu Constant | Éric Laporte
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La plate-forme logicielle Outilex, qui sera mise à la disposition de la recherche, du développement et de l’industrie, comporte des composants logiciels qui effectuent toutes les opérations fondamentales du traitement automatique du texte écrit : traitements sans lexiques, exploitation de lexiques et de grammaires, gestion de ressources linguistiques. Les données manipulées sont structurées dans des formats XML, et également dans d’autres formats plus compacts, soit lisibles soit binaires, lorsque cela est nécessaire ; les convertisseurs de formats nécessaires sont inclus dans la plate-forme ; les formats de grammaires permettent de combiner des méthodes statistiques avec des méthodes fondées sur des ressources linguistiques. Enfin, des lexiques du français et de l’anglais issus du LADL, construits manuellement et d’une couverture substantielle seront distribués avec la plate-forme sous licence LGPL-LR.

Graphes paramétrés et outils de lexicalisation
Éric Laporte | Sébastien Paumier
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters

La lexicalisation des grammaires réduit le nombre des erreurs d’analyse syntaxique et améliore les résultats des applications. Cependant, cette modification affecte un système d’analyse syntaxique dans tous ses aspects. Un de nos objectifs de recherche est de mettre au point un modèle réaliste pour la lexicalisation des grammaires. Nous avons réalisé des expériences en ce sens avec une grammaire très simple par son contenu et son formalisme, et un lexique syntaxique très informatif, le lexique-grammaire du français élaboré au LADL. La méthode de lexicalisation est celle des graphes paramétrés. Nos résultats tendent à montrer que la plupart des informations contenues dans le lexique-grammaire peuvent être transférées dans une grammaire et exploitées avec succès dans l’analyse syntaxique de phrases.

Morphological annotation of Korean with Directly Maintainable Resources
Ivan Berlocher | Hyun-gue Huh | Eric Laporte | Jee-sun Nam
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This article describes an exclusively resource-based method of morphological annotation of written Korean text. Korean is an agglutinative language. Our annotator is designed to process text before the operation of a syntactic parser. In its present state, it annotates one-stem words only. The output is a graph of morphemes annotated with accurate linguistic information. The granularity of the tagset is 3 to 5 times higher than usual tagsets. A comparison with a reference annotated corpus showed that it achieves 89% recall without any corpus training. The language resources used by the system are lexicons of stems, transducers of suffixes and transducers of generation of allomorphs. All can be easily updated, which allows users to control the evolution of the performances of the system. It has been claimed that morphological annotation of Korean text could only be performed by a morphological analysis module accessing a lexicon of morphemes. We show that it can also be performed directly with a lexicon of words and without applying morphological rules at annotation time, which speeds up annotation to 1,210 words. The lexicon of words is obtained from the maintainable language resources through a fully automated compilation process.


A Resource-based Korean Morphological Annotation System
Hyun-gue Huh | Éric Laporte
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts


Synthesis of Spoken Messages from Semantic Representations (Semantic-Representation-to-Speech System)
Laurence Danlos | Eric Laporte | Francoise Emerard
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics