CoFiF Plus: A French Financial Narrative Summarisation Corpus

Nadhem Zmandar; Tobias Daudert; Sina Ahmadi; Mahmoud El-Haj; Paul Rayson

CoFiF Plus: A French Financial Narrative Summarisation Corpus

Nadhem Zmandar, Tobias Daudert, Sina Ahmadi, Mahmoud El-Haj, Paul Rayson

Abstract

Natural Language Processing is increasingly being applied in the finance and business industry to analyse the text of many different types of financial documents. Given the increasing growth of firms around the world, the volume of financial disclosures and financial texts in different languages and forms is increasing sharply and therefore the study of language technology methods that automatically summarise content has grown rapidly into a major research area. Corpora for financial narrative summarisation exists in English, but there is a significant lack of financial text resources in the French language. To remedy this, we present CoFiF Plus, the first French financial narrative summarisation dataset providing a comprehensive set of financial text written in French. The dataset has been extracted from French financial reports published in PDF file format. It is composed of 1,703 reports from the most capitalised companies in France (Euronext Paris) covering a time frame from 1995 to 2021. This paper describes the collection, annotation and validation of the financial reports and their summaries. It also describes the dataset and gives the results of some baseline summarisers. Our datasets will be openly available upon the acceptance of the paper.

Anthology ID:: 2022.lrec-1.174
Volume:: Proceedings of the Thirteenth Language Resources and Evaluation Conference
Month:: June
Year:: 2022
Address:: Marseille, France
Editors:: Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
Venue:: LREC
SIG:
Publisher:: European Language Resources Association
Note:
Pages:: 1622–1639
Language:
URL:: https://aclanthology.org/2022.lrec-1.174/
DOI:
Bibkey:
Cite (ACL):: Nadhem Zmandar, Tobias Daudert, Sina Ahmadi, Mahmoud El-Haj, and Paul Rayson. 2022. CoFiF Plus: A French Financial Narrative Summarisation Corpus. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 1622–1639, Marseille, France. European Language Resources Association.
Cite (Informal):: CoFiF Plus: A French Financial Narrative Summarisation Corpus (Zmandar et al., LREC 2022)
Copy Citation:
PDF:: https://aclanthology.org/2022.lrec-1.174.pdf

PDF Cite Search Fix data