Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction

Emmanuel Giguet, Gaël Lejeune, Jean-Baptiste Tanguy


Abstract
We present our contributions for the 2020 FinTOC Shared Tasks: Title Detection and Table of Contents Extraction. For the Structure Extraction task, we propose an approach that combines information from multiple sources: the table of contents, the wording of the document, and lexical domain knowledge. For the title detection task, we compare surface features to character-based features on various training configurations. We show that title detection results are very sensitive to the kind of training dataset used.
Anthology ID:
2020.fnp-1.30
Volume:
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
Month:
December
Year:
2020
Address:
Barcelona, Spain (Online)
Editors:
Dr Mahmoud El-Haj, Dr Vasiliki Athanasakou, Dr Sira Ferradans, Dr Catherine Salzedo, Dr Ans Elhag, Dr Houda Bouamor, Dr Marina Litvak, Dr Paul Rayson, Dr George Giannakopoulos, Nikiforos Pittaras
Venue:
FNP
SIG:
Publisher:
COLING
Note:
Pages:
174–180
Language:
URL:
https://aclanthology.org/2020.fnp-1.30
DOI:
Bibkey:
Cite (ACL):
Emmanuel Giguet, Gaël Lejeune, and Jean-Baptiste Tanguy. 2020. Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction. In Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation, pages 174–180, Barcelona, Spain (Online). COLING.
Cite (Informal):
Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction (Giguet et al., FNP 2020)
Copy Citation:
PDF:
https://aclanthology.org/2020.fnp-1.30.pdf