Pierre André Ménard

2022

A French Corpus of Québec’s Parliamentary Debates
Pierre André Ménard | Desislava Aleksandrova
Proceedings of the Workshop ParlaCLARIN III within the 13th Language Resources and Evaluation Conference

Parliamentary debates offer a window on political stances as well as a repository of linguistic and semantic knowledge. They provide insights and reasons for laws and regulations that impact electors in their everyday life. One such resource is the transcribed debates available online from the Assemblée Nationale du Québec (ANQ). This paper describes the effort to convert the online ANQ debates from various HTML formats into a standardized ParlaMint TEI annotated corpus and to enrich it with annotations extracted from related unstructured members and political parties list. The resulting resource includes 88 years of debates over a span of 114 years with more than 33.3 billion words. The addition of linguistic annotations is detailed as well as a quantitative analysis of part-of-speech tags and distribution of utterances across the corpus.

2021

pdf bib

UD on Software Requirements: Application and Challenges
Naïma Hassert | Pierre André Ménard | Edith Galy
Proceedings of the Fifth Workshop on Universal Dependencies (UDW, SyntaxFest 2021)

2019

pdf bib abs

Multilingual sentence-level bias detection in Wikipedia
Desislava Aleksandrova | François Lareau | Pierre André Ménard
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

We propose a multilingual method for the extraction of biased sentences from Wikipedia, and use it to create corpora in Bulgarian, French and English. Sifting through the revision history of the articles that at some point had been considered biased and later corrected, we retrieve the last tagged and the first untagged revisions as the before/after snapshots of what was deemed a violation of Wikipedia’s neutral point of view policy. We extract the sentences that were removed or rewritten in that edit. The approach yields sufficient data even in the case of relatively small Wikipedias, such as the Bulgarian one, where 62k articles produced 5k biased sentences. We evaluate our method by manually annotating 520 sentences for Bulgarian and French, and 744 for English. We assess the level of noise and analyze its sources. Finally, we exploit the data with well-known classification methods to detect biased sentences. Code and datasets are hosted at https://github.com/crim-ca/wiki-bias.

pdf bib abs

Turning silver into gold: error-focused corpus reannotation with active learning
Pierre André Ménard | Antoine Mougeot
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019)

While high quality gold standard annotated corpora are crucial for most tasks in natural language processing, many annotated corpora published in recent years, created by annotators or tools, contains noisy annotations. These corpora can be viewed as more silver than gold standards, even if they are used in evaluation campaigns or to compare systems’ performances. As upgrading a silver corpus to gold level is still a challenge, we explore the application of active learning techniques to detect errors using four datasets designed for document classification and part-of-speech tagging. Our results show that the proposed method for the seeding step improves the chance of finding incorrect annotations by a factor of 2.73 when compared to random selection, a 14.71% increase from the baseline methods. Our query method provides an increase in the error detection precision on average by a factor of 1.78 against random selection, an increase of 61.82% compared to other query approaches.

2017

pdf bib

GeoDict: an integrated gazetteer
Jacques Fize | Gaurav Shrivastava | Pierre André Ménard
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)

pdf bib

Fine-grained domain classification of text using TERMIUM Plus
Gabriel Bernier-Colborne | Caroline Barrière | Pierre André Ménard
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017)

pdf bib

PACTE: A colloaborative platform for textual annotation
Pierre André Ménard | Caroline Barrière
Proceedings of the 13th Joint ISO-ACL Workshop on Interoperable Semantic Annotation (ISA-13)

2016

pdf bib

Classification of comment helpfulness to improve knowledge sharing among medical practitioners.
Pierre André Ménard | Caroline Barrière
Proceedings of the 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

pdf bib abs

Contextual term equivalent search using domain-driven disambiguation
Caroline Barrière | Pierre André Ménard | Daphnée Azoulay
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

This article presents a domain-driven algorithm for the task of term sense disambiguation (TSD). TSD aims at automatically choosing which term record from a term bank best represents the meaning of a term occurring in a particular context. In a translation environment, finding the contextually appropriate term record is necessary to access the proper equivalent to be used in the target language text. The term bank TERMIUM Plus, recently published as an open access repository, is chosen as a domain-rich resource for testing our TSD algorithm, using English and French as source and target languages. We devise an experiment using over 1300 English terms found in scientific articles, and show that our domain-driven TSD algorithm is able to bring the best term record, and therefore the best French equivalent, at the average rank of 1.69 compared to a baseline random rank of 3.51.

2014

pdf bib abs

Linked Open Data and Web Corpus Data for noun compound bracketing
Pierre André Ménard | Caroline Barrière
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This research provides a comparison of a linked open data resource (DBpedia) and web corpus data resources (Google Web Ngrams and Google Books Ngrams) for noun compound bracketing. Large corpus statistical analysis has often been used for noun compound bracketing, and our goal is to introduce a linked open data (LOD) resource for such task. We show its particularities and its performance on the task. Results obtained on resources tested individually are promising, showing a potential for DBpedia to be included in future hybrid systems.

pdf bib

Multiword noun compound bracketing using Wikipedia
Caroline Barrière | Pierre André Ménard
Proceedings of the First Workshop on Computational Approaches to Compound Analysis (ComAComA 2014)