Tobias Daudert


2022

pdf bib
CoFiF Plus: A French Financial Narrative Summarisation Corpus
Nadhem Zmandar | Tobias Daudert | Sina Ahmadi | Mahmoud El-Haj | Paul Rayson
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Natural Language Processing is increasingly being applied in the finance and business industry to analyse the text of many different types of financial documents. Given the increasing growth of firms around the world, the volume of financial disclosures and financial texts in different languages and forms is increasing sharply and therefore the study of language technology methods that automatically summarise content has grown rapidly into a major research area. Corpora for financial narrative summarisation exists in English, but there is a significant lack of financial text resources in the French language. To remedy this, we present CoFiF Plus, the first French financial narrative summarisation dataset providing a comprehensive set of financial text written in French. The dataset has been extracted from French financial reports published in PDF file format. It is composed of 1,703 reports from the most capitalised companies in France (Euronext Paris) covering a time frame from 1995 to 2021. This paper describes the collection, annotation and validation of the financial reports and their summaries. It also describes the dataset and gives the results of some baseline summarisers. Our datasets will be openly available upon the acceptance of the paper.

2020

pdf bib
A Web-based Collaborative Annotation and Consolidation Tool
Tobias Daudert
Proceedings of the Twelfth Language Resources and Evaluation Conference

Annotation tools are a valuable asset for the construction of labelled textual datasets. However, they tend to have a rigid structure, closed back-end and front-end, and are built in a non-user-friendly way. These downfalls difficult their use in annotation tasks requiring varied text formats, prevent researchers to optimise the tool to the annotation task, and impede people with little programming knowledge to easily modify the tool rendering it unusable for a large cohort. Targeting these needs, we present a web-based collaborative annotation and consolidation tool (AWOCATo), capable of supporting varied textual formats. AWOCATo is based on three pillars: (1) Simplicity, built with a modular architecture employing easy to use technologies; (2) Flexibility, the JSON configuration file allows an easy adaption to the annotation task; (3) Customizability, parameters such as labels, colours, or consolidation features can be easily customized. These features allow AWOCATo to support a range of tasks and domains, filling the gap left by the absence of annotation tools that can be used by people with and without programming knowledge, including those who wish to easily adapt a tool to less common tasks. AWOCATo is available for download at https://github.com/TDaudert/AWOCATo.

2019

pdf bib
CoFiF: A Corpus of Financial Reports in French Language
Tobias Daudert | Sina Ahmadi
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

pdf bib
CoSACT: A Collaborative Tool for Fine-Grained Sentiment Annotation and Consolidation of Text
Tobias Daudert | Manel Zarrouk | Brian Davis
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

pdf bib
NUIG at the FinSBD Task: Sentence Boundary Detection for Noisy Financial PDFs in English and French
Tobias Daudert | Sina Ahmadi
Proceedings of the First Workshop on Financial Technology and Natural Language Processing

pdf bib
SemEval-2019 Task 9: Suggestion Mining from Online Reviews and Forums
Sapna Negi | Tobias Daudert | Paul Buitelaar
Proceedings of the 13th International Workshop on Semantic Evaluation

We present the pilot SemEval task on Suggestion Mining. The task consists of subtasks A and B, where we created labeled data from feedback forum and hotel reviews respectively. Subtask A provides training and test data from the same domain, while Subtask B evaluates the system on a test dataset from a different domain than the available training data. 33 teams participated in the shared task, with a total of 50 members. We summarize the problem definition, benchmark dataset preparation, and methods used by the participating teams, providing details of the methods used by the top ranked systems. The dataset is made freely available to help advance the research in suggestion mining, and reproduce the systems submitted under this task

2018

pdf bib
Leveraging News Sentiment to Improve Microblog Sentiment Classification in the Financial Domain
Tobias Daudert | Paul Buitelaar | Sapna Negi
Proceedings of the First Workshop on Economics and Natural Language Processing

With the rising popularity of social media in the society and in research, analysing texts short in length, such as microblogs, becomes an increasingly important task. As a medium of communication, microblogs carry peoples sentiments and express them to the public. Given that sentiments are driven by multiple factors including the news media, the question arises if the sentiment expressed in news and the news article themselves can be leveraged to detect and classify sentiment in microblogs. Prior research has highlighted the impact of sentiments and opinions on the market dynamics, making the financial domain a prime case study for this approach. Therefore, this paper describes ongoing research dealing with the exploitation of news contained sentiment to improve microblog sentiment classification in a financial context.

pdf bib
Linking News Sentiment to Microblogs: A Distributional Semantics Approach to Enhance Microblog Sentiment Classification
Tobias Daudert | Paul Buitelaar
Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis

Social media’s popularity in society and research is gaining momentum and simultaneously increasing the importance of short textual content such as microblogs. Microblogs are affected by many factors including the news media, therefore, we exploit sentiments conveyed from news to detect and classify sentiment in microblogs. Given that texts can deal with the same entity but might not be vastly related when it comes to sentiment, it becomes necessary to introduce further measures ensuring the relatedness of texts while leveraging the contained sentiments. This paper describes ongoing research introducing distributional semantics to improve the exploitation of news-contained sentiment to enhance microblog sentiment classification.

2017

pdf bib
Analysing Market Sentiments: Utilising Deep Learning to Exploit Relationships within the Economy
Tobias Daudert
Proceedings of the Student Research Workshop Associated with RANLP 2017

In today’s world, globalisation is not only affecting inter-culturalism but also linking markets across the globe. Given that all markets are affecting each other and are not only driven by fundamental data but also by sentiments, sentiment analysis regarding the markets becomes a tool to predict, anticipate, and milden future economic crises such as the one we faced in 2008. In this paper, an approach to improve sentiment analysis by exploiting relationships among different kinds of sentiment, together with supplementary information, from and across various data sources is proposed.

pdf bib
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Keith Cortis | André Freitas | Tobias Daudert | Manuela Huerlimann | Manel Zarrouk | Siegfried Handschuh | Brian Davis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper discusses the “Fine-Grained Sentiment Analysis on Financial Microblogs and News” task as part of SemEval-2017, specifically under the “Detecting sentiment, humour, and truth” theme. This task contains two tracks, where the first one concerns Microblog messages and the second one covers News Statements and Headlines. The main goal behind both tracks was to predict the sentiment score for each of the mentioned companies/stocks. The sentiment scores for each text instance adopted floating point values in the range of -1 (very negative/bearish) to 1 (very positive/bullish), with 0 designating neutral sentiment. This task attracted a total of 32 participants, with 25 participating in Track 1 and 29 in Track 2.