Elvys Linhares-Pontes

Also published as: Elvys Linhares Pontes, Elvys Linhares Pontes


2025

pdf bib
π-YALLI : un nouveau corpus pour des modèles de langue nahuatl / Yankuik nawatlahtolkorpus pampa tlahtolmachiotl
Juan-José Guzmán-Landa | Juan-Manuel Torres-Moreno | Martha Lorena Avendaño Garrido | Miguel Figueroa-Saavedra | Ligia Quintana-Torres | Graham Ranger | Carlos-Emiliano González-Gallardo | Elvys Linhares-Pontes | Patricia Velázquez-Morales | Luis-Gil Moreno-Jiménez
Actes des 32ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), volume 1 : articles scientifiques originaux

π-YALLI : a new corpus for Nahuatl Language Models The Nahuatl is a language with few computational resources, despite the fact that it is a living language spoken by around two million people. We built π-YALLI, a corpus that enables research and development of dynamic and static Language Models (LM). We measured the perplexity of π-YALLI, evaluating state-of-the-art LM performance on a manually annotated semantic similarity corpus relative to annotator agreement. The results show the difficulty of working with this π-language, but at the same time open up interesting perspectives for the study of other NLP tasks on Nahuatl.

pdf bib
Backtesting des signaux de sentiment pour le trading : évaluer la viabilité de la génération d’alpha à partir de l’analyse de sentiment
Elvys Linhares Pontes | Carlos-Emiliano González-Gallardo | Georgeta Bordea | Jose G Moreno | Mohamed Ben Jannet | Yuxuan Zhao | Antoine Doucet
Actes de la session industrielle de CORIA-TALN 2025

L’analyse de sentiment, largement utilisée dans les avis de produits, influence également les marchés financiers en affectant les prix des actifs à travers les microblogs et les articles de presse. Bien que la recherche sur la finance basée sur le sentiment soit abondante, de nombreuses études se concentrent sur la classification au niveau des phrases, négligeant son application pratique dans le trading. Cette étude comble cette lacune en évaluant des stratégies de trading basées sur le sentiment pour générer un alpha positif. Nous réalisons une analyse de backtesting en utilisant des prédictions de sentiment de trois modèles (deux basés sur la classification et un basé sur la régression) appliqués aux articles de presse concernant les actions du Dow Jones 30, en les comparant à la stgonzalezgallardo@univtours.frratégie de référence Buy&Hold. Les résultats montrent que tous les modèles ont généré des rendements positifs, le modèle de régression enregistrant le rendement le plus élevé de 50,63% sur 28 mois, surpassant ainsi la stratégie Buy&Hold. Cela souligne le potentiel de l’analyse de sentiment pour affiner les stratégies d’investissement et améliorer la prise de décisions financières.

2024

pdf bib
L3iTC at the FinLLM Challenge Task: Quantization for Financial Text Classification & Summarization
Elvys Linhares Pontes | Carlos-Emiliano González-Gallardo | Mohamed Benjannet | Caryn Qu | Antoine Doucet
Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning

2023

pdf bib
Leveraging BERT Language Models for Multi-Lingual ESG Issue Identification
Elvys Linhares Pontes | Mohamed Benjannet | Lam Kim Ming
Proceedings of the Fifth Workshop on Financial Technology and Natural Language Processing and the Second Multimodal AI For Financial Forecasting

2022

pdf bib
Using Contextual Sentence Analysis Models to Recognize ESG Concepts
Elvys Linhares Pontes | Mohamed Ben Jannet | Jose G. Moreno | Antoine Doucet
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

This paper summarizes the joint participation of the Trading Central Labs and the L3i laboratory of the University of La Rochelle on both sub-tasks of the Shared Task FinSim-4 evaluation campaign. The first sub-task aims to enrich the ‘Fortia ESG taxonomy’ with new lexicon entries while the second one aims to classify sentences to either ‘sustainable’ or ‘unsustainable’ with respect to ESG (Environment, Social and Governance) related factors. For the first sub-task, we proposed a model based on pre-trained Sentence-BERT models to project sentences and concepts in a common space in order to better represent ESG concepts. The official task results show that our system yields a significant performance improvement compared to the baseline and outperforms all other submissions on the first sub-task. For the second sub-task, we combine the RoBERTa model with a feed-forward multi-layer perceptron in order to extract the context of sentences and classify them. Our model achieved high accuracy scores (over 92%) and was ranked among the top 5 systems.

2021

pdf bib
Exploratory Analysis of News Sentiment Using Subgroup Discovery
Anita Valmarska | Luis Adrián Cabrera-Diego | Elvys Linhares Pontes | Senja Pollak
Proceedings of the 8th Workshop on Balto-Slavic Natural Language Processing

In this study, we present an exploratory analysis of a Slovenian news corpus, in which we investigate the association between named entities and sentiment in the news. We propose a methodology that combines Named Entity Recognition and Subgroup Discovery - a descriptive rule learning technique for identifying groups of examples that share the same class label (sentiment) and pattern (features - Named Entities). The approach is used to induce the positive and negative sentiment class rules that reveal interesting patterns related to different Slovenian and international politicians, organizations, and locations.

pdf bib
CTLR@WiC-TSV: Target Sense Verification using Marked Inputs andPre-trained Models
José G. Moreno | Elvys Linhares Pontes | Gaël Dias
Proceedings of the 6th Workshop on Semantic Deep Learning (SemDeep-6)

2020

pdf bib
Alleviating Digitization Errors in Named Entity Recognition for Historical Documents
Emanuela Boros | Ahmed Hamdi | Elvys Linhares Pontes | Luis Adrián Cabrera-Diego | Jose G. Moreno | Nicolas Sidere | Antoine Doucet
Proceedings of the 24th Conference on Computational Natural Language Learning

This paper tackles the task of named entity recognition (NER) applied to digitized historical texts obtained from processing digital images of newspapers using optical character recognition (OCR) techniques. We argue that the main challenge for this task is that the OCR process leads to misspellings and linguistic errors in the output text. Moreover, historical variations can be present in aged documents, which can impact the performance of the NER process. We conduct a comparative evaluation on two historical datasets in German and French against previous state-of-the-art models, and we propose a model based on a hierarchical stack of Transformers to approach the NER task for historical data. Our findings show that the proposed model clearly improves the results on both historical datasets, and does not degrade the results for modern datasets.

2019

pdf bib
TLR at BSNLP2019: A Multilingual Named Entity Recognition System
Jose G. Moreno | Elvys Linhares Pontes | Mickael Coustaty | Antoine Doucet
Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing

This paper presents our participation at the shared task on multilingual named entity recognition at BSNLP2019. Our strategy is based on a standard neural architecture for sequence labeling. In particular, we use a mixed model which combines multilingualcontextual and language-specific embeddings. Our only submitted run is based on a voting schema using multiple models, one for each of the four languages of the task (Bulgarian, Czech, Polish, and Russian) and another for English. Results for named entity recognition are encouraging for all languages, varying from 60% to 83% in terms of Strict and Relaxed metrics, respectively.

2018

pdf bib
Predicting the Semantic Textual Similarity with Siamese CNN and LSTM
Elvys Linhares Pontes | Stéphane Huet | Andréa Carneiro Linhares | Juan-Manuel Torres-Moreno
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Semantic Textual Similarity (STS) is the basis of many applications in Natural Language Processing (NLP). Our system combines convolution and recurrent neural networks to measure the semantic similarity of sentences. It uses a convolution network to take account of the local context of words and an LSTM to consider the global context of sentences. This combination of networks helps to preserve the relevant information of sentences and improves the calculation of the similarity between sentences. Our model has achieved good results and is competitive with the best state-of-the-art systems.

pdf bib
A New Annotated Portuguese/Spanish Corpus for the Multi-Sentence Compression Task
Elvys Linhares Pontes | Juan-Manuel Torres-Moreno | Stéphane Huet | Andréa Carneiro Linhares
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Multi-Sentence Compression with Word Vertex-Labeled Graphs and Integer Linear Programming
Elvys Linhares Pontes | Stéphane Huet | Thiago Gouveia da Silva | Andréa Carneiro Linhares | Juan-Manuel Torres-Moreno
Proceedings of the Twelfth Workshop on Graph-Based Methods for Natural Language Processing (TextGraphs-12)

Multi-Sentence Compression (MSC) aims to generate a short sentence with key information from a cluster of closely related sentences. MSC enables summarization and question-answering systems to generate outputs combining fully formed sentences from one or several documents. This paper describes a new Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, and novel 3-gram scores to generate more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state-of-the-art for evaluations led on news dataset. We led both automatic and manual evaluations to determine the informativeness and the grammaticality of compressions for each dataset. Additional tests, which take advantage of the fact that the length of compressions can be modulated, still improve ROUGE scores with shorter output sentences.