Marie-Claude L’Homme

Also published as: Marie-Claude L’ Homme


2020

pdf bib
Automatic Term Extraction from Newspaper Corpora: Making the Most of Specificity and Common Features
Patrick Drouin | Jean-Benoît Morel | Marie-Claude L’ Homme
Proceedings of the 6th International Workshop on Computational Terminology

The first step of any terminological work is to setup a reliable, specialized corpus composed of documents written by specialists and then to apply automatic term extraction (ATE) methods to this corpus in order to retrieve a first list of potential terms. In this paper, the experiment we describe differs quite drastically from this usual process since we are applying ATE to unspecialized corpora. The corpus used for this study was built from newspaper articles retrieved from the Web using a short list of keywords. The general intuition on which this research is based is that ATE based corpus comparison techniques can be used to capture both similarities and dissimilarities between corpora. The former are exploited through a termhood measure and the latter through word embeddings. Our initial results were validated manually and show that combining a traditional ATE method that focuses on dissimilarities between corpora to newer methods that exploit similarities (more specifically distributional features of candidates) leads to promising results.

pdf bib
Building Multilingual Specialized Resources Based on FrameNet: Application to the Field of the Environment
Marie-Claude L’ Homme | Benoît Robichaud | Carlos Subirats
Proceedings of the International FrameNet Workshop 2020: Towards a Global, Multilingual FrameNet

The methodology developed within the FrameNet project is being used to compile resources in an increasing number of specialized fields of knowledge. The methodology along with the theoretical principles on which it is based, i.e. Frame Semantics, are especially appealing as they allow domain-specific resources to account for the conceptual background of specialized knowledge and to explain the linguistic properties of terms against this background. This paper presents a methodology for building a multilingual resource that accounts for terms of the environment. After listing some lexical and conceptual differences that need to be managed in such a resource, we explain how the FrameNet methodology is adapted for describing terms in different languages. We first applied our methodology to French and then extended it to English. Extensions to Spanish, Portuguese and Chinese were made more recently. Up to now, we have defined 190 frames: 112 frames are new; 38 are used as such; and 40 are slightly different (a different number of obligatory participants; a significant alternation, etc.) when compared to Berkeley FrameNet.

2018

pdf bib
Browsing the Terminological Structure of a Specialized Domain: A Method Based on Lexical Functions and their Classification
Marie-Claude L’Homme | Benoît Robichaud | Nathalie Prévil
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Lexical Profiling of Environmental Corpora
Patrick Drouin | Marie-Claude L’Homme | Benoît Robichaud
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
A Proposal for combining “general” and specialized frames
Marie-Claude L’ Homme | Carlos Subirats | Benoît Robichaud
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)

The objectives of the work described in this paper are: 1. To list the differences between a general language resource (namely FrameNet) and a domain-specific resource; 2. To devise solutions to merge their contents in order to increase the coverage of the general resource. Both resources are based on Frame Semantics (Fillmore 1985; Fillmore and Baker 2010) and this raises specific challenges since the theoretical framework and the methodology derived from it provide for both a lexical description and a conceptual representation. We propose a series of strategies that handle both lexical and conceptual (frame) differences and implemented them in the specialized resource. We also show that most differences can be handled in a straightforward manner. However, some more domain specific differences (such as frames defined exclusively for the specialized domain or relations between these frames) are likely to be much more difficult to take into account since some are domain-specific.

2015

pdf bib
Pourquoi construire des ressources terminologiques et pourquoi le faire différemment ?
Marie-Claude L’Homme
Actes de la 22e conférence sur le Traitement Automatique des Langues Naturelles. Conférences invitées

Dans cette présentation, je défendrai l’idée selon laquelle des ressources terminologiques décrivant les propriétés lexico-sémantiques des termes constituent un complément nécessaire, voire indispensable, à d’autres types de ressources, À partir d’exemples anglais et français empruntés au domaine de l’environnement, je montrerai, d’une part, que les ressources lexicales générales (y compris celles qui ont une large couverture) n’offrent pas un portait complet du sens des termes ou de la structure lexicale observée du point de vue d’un domaine de spécialité. Je montrerai, d’autre part, que les ressources terminologiques (thésaurus, ontologies, banques de terminologie) souvent d’obédience conceptuelle, se concentrent sur le lien entre les termes et les connaissances dénotées par eux et s’attardent peu sur leur fonctionnement linguistique. Je présenterai un type de ressource décrivant les propriétés lexico-sémantiques des termes d’un domaine (structure actantielle, liens lexicaux, annotations contextuelles, etc.) et des éléments méthodologiques présidant à son élaboration.

2014

pdf bib
Frames and terminology: representing predicative terms in the field of the environment
Marie-Claude L’ Homme | Benoît Robichaud
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf bib
Definition patterns for predicative terms in specialized lexical resources
Antonio San Martín | Marie-Claude L’Homme
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The research presented in this paper is part of a larger project on the semi-automatic generation of definitions of semantically-related terms in specialized resources. The work reported here involves the formulation of instructions to generate the definitions of sets of morphologically-related predicative terms, based on the definition of one of the members of the set. In many cases, it is assumed that the definition of a predicative term can be inferred by combining the definition of a related lexical unit with the information provided by the semantic relation (i.e. lexical function) that links them. In other words, terminographers only need to know the definition of “pollute” and the semantic relation that links it to other morphologically-related terms (“polluter”, “polluting”, “pollutant”, etc.) in order to create the definitions of the set. The results show that rules can be used to generate a preliminary set of definitions (based on specific lexical functions). They also show that more complex rules would need to be devised for other morphological pairs.

pdf bib
Discovering frames in specialized domains
Marie-Claude L’Homme | Benoît Robichaud | Carlos Subirats Rüggeberg
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper proposes a method for discovering semantic frames (Fillmore, 1982, 1985; Fillmore et al., 2003) in specialized domains. It is assumed that frames are especially relevant for capturing the lexical structure in specialized domains and that they complement structures such as ontologies that appear better suited to represent specific relationships between entities. The method we devised is based on existing lexical entries recorded in a specialized database related to the field of the environment (erode, impact, melt, recycling, warming). The frames and the data encoded in FrameNet are used as a reference. Selected information was extracted automatically from the database on the environment (and, when possible, compared to FrameNet), and presented to a linguist who analyzed this information to discover potential frames. Several different frames were discovered with this method. About half of them correspond to frames already described in FrameNet; some new frames were also defined and part of these might be specific to the field of the environment.

2012

pdf bib
Capturing syntactico-semantic regularities among terms: An application of the FrameNet methodology to terminology
Marie-Claude L’Homme | Janine Pimentel
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Terminological databases do not always provide detailed information on the linguistic behaviour of terms, although this is important for potential users such as translators or students. In this paper we describe a project that aims to fill this gap by proposing a method for annotating terms in sentences based on that developed within the FrameNet project (Ruppenhofer et al. 2010) and by implementing it in an online resource called DiCoInfo. We focus on the methodology we devised, and show with some preliminary results how similar actantial (i.e. argumental) structures can provide evidence for defining lexical relations in specific languages and capturing cross-linguistic equivalents. The paper argues that the syntactico-semantic annotation of the contexts in which the terms occur allows lexicographers to validate their intuitions concerning the linguistic behaviour of terms as well as interlinguistic relations between them. The syntactico-semantic annotation of contexts could, therefore, be considered a good starting point in terminology work that aims to describe the linguistic functioning of terms and offer a sounder basis to define interlinguistic relationships between terms that belong to different languages.

pdf bib
Semantic Relations Established by Specialized Processes Expressed by Nouns and Verbs: Identification in a Corpus by means of Syntactico-semantic Annotation
Nava Maroto | Marie-Claude L’Homme | Amparo Alcina
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This article presents the methodology and results of the analysis of terms referring to processes expressed by verbs or nouns in a corpus of specialized texts dealing with ceramics. Both noun and verb terms are explored in context in order to identify and represent the semantic roles held by their participants (arguments and circumstants), and therefore explore some of the relations established by these terms. We present a methodology for the identification of related terms that take part in the development of specialized processes and the annotation of the semantic roles expressed in these contexts. The analysis has allowed us to identify participants in the process, some of which were already present in our previous work, but also some new ones. This method is useful in the distinction of different meanings of the same verb. Contexts in which processes are expressed by verbs have proved to be very informative, even if they are less frequent in the corpus. This work is viewed as a first step in the implementation ― in ontologies ― of conceptual relations in which activities are involved.

2011

pdf bib
Attribution de rôles sémantiques aux actants des lexies verbales (Assigning semantic roles to actants of verbal lexical units)
Fadila Hadouche | Guy Lapalme | Marie-Claude L’Homme
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles courts

Dans cet article, nous traitons de l’attribution des rôles sémantiques aux actants de lexies verbales en corpus spécialisé en français. Nous proposons une classification de rôles sémantiques par apprentissage machine basée sur un corpus de lexies verbales annotées manuellement du domaine de l’informatique et d’Internet. Nous proposons également une méthode de partitionnement semi-supervisé pour prendre en compte l’annotation de nouvelles lexies ou de nouveaux rôles sémantiques et de les intégrés dans le système. Cette méthode de partitionnement permet de regrouper les instances d’actants selon les valeurs communes correspondantes aux traits de description des actants dans des groupes d’instances d’actants similaires. La classification de rôles sémantique a obtenu une F-mesure de 93% pour Patient, de 90% pour Agent, de 85% pour Destination et de 76% pour les autres rôles pris ensemble. Quand au partitionnement en regroupant les instances selon leur similarité donne une F-mesure de 88% pour Patient, de 81% pour Agent, de 58% pour Destination et de 46% pour les autres rôles.

2010

pdf bib
Identification des actants et circonstants par apprentissage machine
Fadila Hadouche | Guy Lapalme | Marie-Claude L’Homme
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Dans cet article, nous traitons de l’identification automatique des participants actants et circonstants de lexies prédicatives verbales tirées d’un corpus spécialisé en langue française. Les actants contribuent à la réalisation du sens de la lexie alors que les circonstants sont optionnels : ils ajoutent une information supplémentaire qui ne fait pas partie intégrante du sémantisme de la lexie. Nous proposons une classification de ces participants par apprentissage machine basée sur un corpus de lexies verbales du domaine de l’informatique, lexies qui ont été annotées manuellement avec des rôles sémantiques. Nous présentons des features qui nous permettent d’identifier les participants et de distinguer les actants des circonstants.

2006

pdf bib
A Methodology for Developing Multilingual Resources for Terminology
Marie-Claude L’Homme | Hee Sook Bae
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents a project that aims at building lexical resources for terminology. By lexical resources, we mean dictionaries that provide detailed lexico-semantic information on terms, i.e. lexical units the sense of which can be related to a special subject field. In terminology, there is a lack of such resources. The specific dictionaries we are currently developing describe basic French and Korean terms that belong to the fields of computer science and the Internet (e.g. computer, configure, user-friendly, Web, browse, spam). This paper presents the structure of the French and Korean articles: each component is examined and illustrated with examples. We then describe the corpus-based methodology and the different computer applications used for developing the articles. Our methodology comprises five steps: design of the corpora, selection of terms; sense distinction; definition of actantial structures and listing of semantic relations. Details on the current state of each database are also given.

2004

pdf bib
A Lexico-semantic Approach to the Structuring of Terminology
Marie-Claude L’Homme
Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology

pdf bib
Discovering Specific Semantic Relationships between Nouns and Verbs in a Specialized French Corpus
Vincent Claveau | Marie-Claude L’Homme
Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology

2003

pdf bib
Teaching the automation of the translation process to future translators
Benoît Robichaud | Marie-Claude L’Homme
Workshop on Teaching Translation Technologies and Tools

This paper describes the approach used for introducing CAT tools and MT systems into a course offered in translation curricula at the Université de Montréal (Canada). It focuses on the automation of the translation process and presents various strategies that have been developed to help students progressively acquire the knowledge necessary to understand and undertake the tasks involved in the automation of translation. We begin with very basic principles and techniques, and move towards complex processes of advanced CAT and revision tools, including ultimately MT systems. As we will see, teaching concepts related to MT serves both as a wrap-up for the subjects dealt with during the semester and a way to highlight the tasks involved in the transfer phase of translation.