Oksana Vladimirovna Belyaeva
2022
ISPRAS@FinTOC-2022 Shared Task: Two-stage TOC Generation Model
Anastasiia Bogatenkova
|
Oksana Vladimirovna Belyaeva
|
Andrew Igorevich Perminov
|
Ilya Sergeevich Kozlov
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022
This work is connected with participation in FinTOC-2022 Shared Task: “Financial Document Structure Extraction”. The competition contains two subtasks: title detection and TOC generation. We describe an approach for solving these tasks and propose the pipeline, consisting of extraction of document lines and existing TOC, feature matrix forming and classification. Classification model consists of two classifiers: the first binary classifier separates title lines from non-title, the second one determines the title level. In the title detection task, we got 0.900, 0.778 and 0.558 F1 measure, in the TOC generation task we got 63.1, 41.5 and 40.79 the harmonic mean of Inex F1 score and Inex level accuracy for English, French and Spanish documents respectively. With these results, our approach took first place among English and French submissions and second place among Spanish submissions. As a team, we took first place in the competition in English and French categories and second place in the competition in Spanish.