Gaël Dias

Also published as: Gael Dias, Gäel Dias


2024

pdf bib
Analysing relevance of Discourse Structure for Improved Mental Health Estimation
Navneet Agarwal | Gaël Dias | Sonia Dollfus
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

Automated depression estimation has received significant research attention in recent years as a result of its growing impact on the global community. Within the context of studies based on patient-therapist interview transcripts, most researchers treat the dyadic discourse as a sequence of unstructured sentences, thus ignoring the discourse structure within the learning process. In this paper we propose Multi-view architectures that divide the input transcript into patient and therapist views based on sentence type in an attempt to utilize symmetric discourse structure for improved model performance. Experiments on DAIC-WOZ dataset for binary classification task within depression estimation show advantages of Multi-view architecture over sequential input representations. Our model also outperforms the current state-of-the-art results and provide new SOTA performance on test set of DAIC-WOZ dataset.

pdf bib
Your Model Is Not Predicting Depression Well And That Is Why: A Case Study of PRIMATE Dataset
Kirill Milintsevich | Kairit Sirts | Gaël Dias
Proceedings of the 9th Workshop on Computational Linguistics and Clinical Psychology (CLPsych 2024)

This paper addresses the quality of annotations in mental health datasets used for NLP-based depression level estimation from social media texts. While previous research relies on social media-based datasets annotated with binary categories, i.e. depressed or non-depressed, recent datasets such as D2S and PRIMATE aim for nuanced annotations using PHQ-9 symptoms. However, most of these datasets rely on crowd workers without the domain knowledge for annotation. Focusing on the PRIMATE dataset, our study reveals concerns regarding annotation validity, particularly for the lack of interest or pleasure symptom. Through reannotation by a mental health professional, we introduce finer labels and textual spans as evidence, identifying a notable number of false positives. Our refined annotations, to be released under a Data Use Agreement, offer a higher-quality test set for anhedonia detection. This study underscores the necessity of addressing annotation quality issues in mental health datasets, advocating for improved methodologies to enhance NLP model reliability in mental health assessments.

pdf bib
Evaluating Lexicon Incorporation for Depression Symptom Estimation
Kirill Milintsevich | Gaël Dias | Kairit Sirts
Proceedings of the 6th Clinical Natural Language Processing Workshop

This paper explores the impact of incorporating sentiment, emotion, and domain-specific lexicons into a transformer-based model for depression symptom estimation. Lexicon information is added by marking the words in the input transcripts of patient-therapist conversations as well as in social media posts. Overall results show that the introduction of external knowledge within pre-trained language models can be beneficial for prediction performance, while different lexicons show distinct behaviours depending on the targeted task. Additionally, new state-of-the-art results are obtained for the estimation of depression level over patient-therapist interviews.

pdf bib
Analyzing Symptom-based Depression Level Estimation through the Prism of Psychiatric Expertise
Navneet Agarwal | Kirill Milintsevich | Lucie Metivier | Maud Rotharmel | Gaël Dias | Sonia Dollfus
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The ever-growing number of people suffering from mental distress has motivated significant research initiatives towards automated depression estimation. Despite the multidisciplinary nature of the task, very few of these approaches include medical professionals in their research process, thus ignoring a vital source of domain knowledge. In this paper, we propose to bring the domain experts back into the loop and incorporate their knowledge within the gold-standard DAIC-WOZ dataset. In particular, we define a novel transformer-based architecture and analyse its performance in light of our expert annotations. Overall findings demonstrate a strong correlation between the psychological tendencies of medical professionals and the behavior of the proposed model, which additionally provides new state-of-the-art results.

2022

pdf bib
Transfer Learning for Humor Detection by Twin Masked Yellow Muppets
Aseem Arora | Gaël Dias | Adam Jatowt | Asif Ekbal
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Humorous texts can be of different forms such as punchlines, puns, or funny stories. Existing humor classification systems have been dealing with such diverse forms by treating them independently. In this paper, we argue that different forms of humor share a common background either in terms of vocabulary or constructs. As a consequence, it is likely that classification performance can be improved by jointly tackling different humor types. Hence, we design a shared-private multitask architecture following a transfer learning paradigm and perform experiments over four gold standard datasets. Empirical results steadily confirm our hypothesis by demonstrating statistically-significant improvements over baselines and accounting for new state-of-the-art figures for two datasets.

2021

pdf bib
Understanding Feature Focus in Multitask Settings for Lexico-semantic Relation Identification
Houssam Akhmouch | Gaël Dias | Jose G. Moreno
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Stratégie Multitâche pour la Classification Multiclasse (A Multitask Strategy for Multiclass Classification)
Houssam Akhmouch | Hamza Bouanani | Gaël Dias | Jose G. Moreno
Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale

Nous proposons une idée originale pour exploiter les relations entre les classes dans les problèmes multiclasses. Nous définissons deux architectures multitâches de type one-vs-rest qui combinent des ensembles de classifieurs appris dans une configuration multitâche en utilisant des réseaux de neurones. Les expériences menées sur six jeux de données pour la classification des sentiments, des émotions, des thématiques et des relations lexico-sémantiques montrent que nos architectures améliorent constamment les performances par rapport aux stratégies de l’état de l’art de type one-vsrest et concurrencent fortement les autres stratégies multiclasses.

pdf bib
CTLR@WiC-TSV: Target Sense Verification using Marked Inputs andPre-trained Models
José G. Moreno | Elvys Linhares Pontes | Gaël Dias
Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6)

2017

pdf bib
Demographic Word Embeddings for Racism Detection on Twitter
Mohammed Hasanuzzaman | Gaël Dias | Andy Way
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Most social media platforms grant users freedom of speech by allowing them to freely express their thoughts, beliefs, and opinions. Although this represents incredible and unique communication opportunities, it also presents important challenges. Online racism is such an example. In this study, we present a supervised learning strategy to detect racist language on Twitter based on word embedding that incorporate demographic (Age, Gender, and Location) information. Our methodology achieves reasonable classification accuracy over a gold standard dataset (F1=76.3%) and significantly improves over the classification performance of demographic-agnostic models.

2016

pdf bib
Identifying Temporal Orientation of Word Senses
Mohammed Hasanuzzaman | Gaël Dias | Stéphane Ferrari | Yann Mathet | Andy Way
Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning

2015

pdf bib
QASSIT: A Pretopological Framework for the Automatic Construction of Lexical Taxonomies from Raw Texts
Guillaume Cleuziou | Davide Buscaldi | Gael Dias | Vincent Levorato | Christine Largeron
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)

2014

pdf bib
HulTech: A General Purpose System for Cross-Level Semantic Similarity based on Anchor Web Counts
Jose G. Moreno | Rumen Moraliyski | Asma Berrezoug | Gaël Dias
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

pdf bib
A Hybrid Segmentation of Web Pages for Vibro-Tactile Access on Touch-Screen Devices
Waseem Safi | Fabrice Maurel | Jean-Marc Routoure | Pierre Beust | Gaël Dias
Proceedings of the Third Workshop on Vision and Language

pdf bib
Recognize the Generality Relation between Sentences Using Asymmetric Association Measures
Sebastiao Pais | Gael Dias | Rumen Moraliyski
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)

In this paper we focus on a particular case of entailment, namely entailment by generality. We argue that there exist various types of implication, a range of different levels of entailment reasoning, based on lexical, syntactic, logical and common sense clues, at different levels of difficulty. We introduce the paradigm of Textual Entailment (TE) by Generality, which can be defined as the entailment from a specific statement towards a relatively more general statement. In this context, the Text T entails the Hypothesis H, and at the same time H is more general than T . We propose an unsupervised and language-independent method to recognize TE by Generality given a case of Text − Hypothesis or T − H where entailment relation holds.

pdf bib
Unsupervised and Language Independent Method to Recognize Textual Entailment by Generality
Sebastiao Pais | Gael Dias | Joao Cordeiro | Rumen Moraliyski
Proceedings of the First International Conference on Computational Linguistics in Bulgaria (CLIB 2014)

In this work we introduce a particular case of textual entailment (TE), namely Textual Entailment by Generality (TEG). In text, there are different kinds of entailment yielded from different types of implicative reasoning (lexical, syntactic, common sense based), but here we focus just on TEG, which can be defined as an entailment from a specific statement towards a relatively more G general one. Therefore, we have T (G)→ H whenever the premise T entails the hypothesis H, the hypothesis being more general than the premise. We propose an unsupervised and language-independent method to recognize TEGs, given a pair T, H in an entailment relation. We have evaluated our proposal G → H English pairs, where we know through two experiments: (a) Test on T (G)→ H English pairs, where we know that TEG holds; (b) Test on T → H Portuguese pairs, randomly selected with 60% of TEGs and 40% of TE without generality dependency (TEnG).

pdf bib
Multi-Objective Search Results Clustering
Sudipta Acharya | Sriparna Saha | Jose G. Moreno | Gaël Dias
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf bib
Easy Web Search Results Clustering: When Baselines Can Reach State-of-the-Art Algorithms
Jose G. Moreno | Gaël Dias
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

pdf bib
Propagation Strategies for Building Temporal Ontologies
Mohammed Hasanuzzaman | Gaël Dias | Stéphane Ferrari | Yann Mathet
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, volume 2: Short Papers

2013

pdf bib
Post-Retrieval Clustering Using Third-Order Similarity Measures
José G. Moreno | Gaël Dias | Guillaume Cleuziou
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Détection de mots-clés par approches au grain caractère et au grain mot (Keywords extraction by repeated string analysis) [in French]
Gaëlle Doualan | Mathieu Boucher | Romain Brixtel | Gaël Lejeune | Gaël Dias
JEP-TALN-RECITAL 2012, Workshop DEFT 2012: DÉfi Fouille de Textes (DEFT 2012 Workshop: Text Mining Challenge)

pdf bib
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing
Irina Matveeva | Ahmed Hassan | Gael Dias
Workshop Proceedings of TextGraphs-7: Graph-based Methods for Natural Language Processing

2011

pdf bib
A Contextual Classification Strategy for Polarity Analysis of Direct Quotations from Financial News
Brett Drury | Gaël Dias | Luís Torgo
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf bib
Paraphrase Alignment for Synonym Evidence Discovery
Gintarė Grigonytė | João Paulo Cordeiro | Gaël Dias | Rumen Moraliyski | Pavel Brazdil
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Unsupervised Induction of Sentence Compression Rules
João Cordeiro | Gaël Dias | Pavel Brazdil
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

2008

pdf bib
Fully Unsupervised Graph-Based Discovery of General-Specific Noun Relationships from Web Corpora Frequency Counts
Gaël Dias | Raycho Mukelov | Guillaume Cleuziou
CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning

2007

pdf bib
Biology Based Alignments of Paraphrases for Sentence Compression
João Cordeiro | Gäel Dias | Guillaume Cleuziou
Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing

2006

pdf bib
Automatic Knowledge Representation using a Graph-based Algorithm for Language-Independent Lexical Chaining
Gaël Dias | Cláudia Santos | Guillaume Cleuziou
Proceedings of the Workshop on Information Extraction Beyond The Document

2004

pdf bib
Evaluation of Different Similarity Measures for the Extraction of Multiword Units in a Reinforcement Learning Environment
Gaël Dias | Sérgio Nunes
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
Using Masks, Suffix Array-based Data Structures and Multidimensional Arrays to Compute Positional Ngram Statistics from Corpora
Alexandre Gil | Gaël Dias
Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment

pdf bib
Multiword Unit Hybrid Extraction
Gaël Dias
Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment

2001

pdf bib
Cognates alignment
António Ribeiro | Gaël Dias | Gabriel Lopes | João Mexia
Proceedings of Machine Translation Summit VIII

Some authors (Simard et al.; Melamed; Danielsson & Mühlenbock) have suggested measures of similarity of words in different languages so as to find extra clues for alignment of parallel texts. Cognate words, like ‘Parliament’ and ‘Parlement’, in English and French respectively, provide extra anchors that help to improve the quality of the alignment. In this paper, we will extend an alignment algorithm proposed by Ribeiro et al. using typical contiguous and non-contiguous sequences of characters extracted using a statistically sound method (Dias et al.). With these typical sequences, we are able to find more reliable correspondence points and improve the alignment quality without recurring to heuristics to identify cognates.

2000

pdf bib
Extracting Textual Associations in Part-of-Speech Tagged Corpora
Gaël Dias | Sylvie Guilloré | José Gabriel Pereira Lopes
5th EAMT Workshop: Harvesting Existing Resources