2024
pdf
bib
abs
SENSE-LM : A Synergy between a Language Model and Sensorimotor Representations for Auditory and Olfactory Information Extraction
Cédric Boscher
|
Christine Largeron
|
Véronique Eglin
|
Elöd Egyed-Zsigmond
Findings of the Association for Computational Linguistics: EACL 2024
The five human senses – vision, taste, smell, hearing, and touch – are key concepts that shape human perception of the world. The extraction of sensory references (i.e., expressions that evoke the presence of a sensory experience) in textual corpus is a challenge of high interest, with many applications in various areas. In this paper, we propose SENSE-LM, an information extraction system tailored for the discovery of sensory references in large collections of textual documents. Based on the novel idea of combining the strength of large language models and linguistic resources such as sensorimotor norms, it addresses the task of sensory information extraction at a coarse-grained (sentence binary classification) and fine-grained (sensory term extraction) level.Our evaluation of SENSE-LM for two sensory functions, Olfaction and Audition, and comparison with state-of-the-art methods emphasize a significant leap forward in automating these complex tasks.
pdf
bib
abs
Unsupervised stance detection for social media discussions: A generic baseline
Maia Sutter
|
Antoine Gourru
|
Amine Trabelsi
|
Christine Largeron
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
With the ever-growing use of social media to express opinions on the national and international stage, unsupervised methods of stance detection are increasingly important to handle the task without costly annotation of data. The current unsupervised state-of-the-art models are designed for specific network types, either homophilic or heterophilic, and they fail to generalize to both. In this paper, we first analyze the generalization ability of recent baselines to these two very different network types. Then, we conduct extensive experiments with a baseline model based on text embeddings propagated with a graph neural network that generalizes well to heterophilic and homophilic networks. We show that it outperforms, on average, other state-of-the-art methods across the two network types. Additionally, we show that combining textual and network information outperforms using text only, and that the language model size has only a limited impact on the model performance.
2022
pdf
bib
abs
Community Topic: Topic Model Inference by Consecutive Word Community Discovery
Eric Austin
|
Osmar R. Zaïane
|
Christine Largeron
Proceedings of the 29th International Conference on Computational Linguistics
We present our novel, hyperparameter-free topic modelling algorithm, Community Topic. Our algorithm is based on mining communities from term co-occurrence networks. We empirically evaluate and compare Community Topic with Latent Dirichlet Allocation and the recently developed top2vec algorithm. We find that Community Topic runs faster than the competitors and produces topics that achieve higher coherence scores. Community Topic can discover coherent topics at various scales. The network representation used by Community Topic results in a natural relationship between topics and a topic hierarchy. This allows sub- and super-topics to be found on demand. These features make Community Topic the ideal tool for downstream applications such as applied research and conversational agents.
2015
pdf
bib
QASSIT: A Pretopological Framework for the Automatic Construction of Lexical Taxonomies from Raw Texts
Guillaume Cleuziou
|
Davide Buscaldi
|
Gael Dias
|
Vincent Levorato
|
Christine Largeron
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)