Mei Gan


2022

pdf bib
The Financial Document Structure Extraction Shared Task (FinTOC 2022)
Juyeon Kang | Abderrahim Ait Azzi | Sandra Bellato | Blanca Carbajo Coronado | Mahmoud El-Haj | Ismail El Maarouf | Mei Gan | Ana Gisbert | Antonio Moreno Sandoval
Proceedings of the 4th Financial Narrative Processing Workshop @LREC2022

This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial documents, its participants results and their findings. This shared task was organized as part of The 4th Financial Narrative Processing Workshop (FNP 2022), held jointly at The 13th Edition of the Language Resources and Evaluation Conference (LREC 2022), Marseille, France (El-Haj et al., 2022). This shared task aimed to stimulate research in systems for extracting table-of-contents (TOC) from investment documents (such as financial prospectuses) by detecting the document titles and organizing them hierarchically into a TOC. For the forth edition of this shared task, three subtasks were presented to the participants: one with English documents, one with French documents and the other one with Spanish documents. This year, we proposed a different and revised dataset for English and French compared to the previous editions of FinTOC and a new dataset for Spanish documents was added. The task attracted 6 submissions for each language from 4 teams, and the most successful methods make use of textual, structural and visual features extracted from the documents and propose classification models for detecting titles and TOCs for all of the subtasks.

2021

pdf bib
The Financial Document Structure Extraction Shared Task (FinTOC2021)
Ismail El Maarouf | Juyeon Kang | Abderrahim Ait Azzi | Sandra Bellato | Mei Gan | Mahmoud El-Haj
Proceedings of the 3rd Financial Narrative Processing Workshop

pdf bib
FinSim-3: The 3rd Shared Task on Learning Semantic Similarities for the Financial Domain
Juyeon Kang | Ismail El Maarouf | Sandra Bellato | Mei Gan
Proceedings of the Third Workshop on Financial Technology and Natural Language Processing

2020

pdf bib
Inference Annotation of a Chinese Corpus for Opinion Mining
Liyun Yan | Danni E | Mei Gan | Cyril Grouin | Mathieu Valette
Proceedings of the Twelfth Language Resources and Evaluation Conference

Polarity classification (positive, negative or neutral opinion detection) is well developed in the field of opinion mining. However, existing tools, which perform with high accuracy on short sentences and explicit expressions, have limited success interpreting narrative phrases and inference contexts. In this article, we will discuss an important aspect of opinion mining: inference. We will give our definition of inference, classify different types, provide an annotation framework and analyze the annotation results. While inferences are often studied in the field of Natural-language understanding (NLU), we propose to examine inference as it relates to opinion mining. Firstly, based on linguistic analysis, we clarify what kind of sentence contains an inference. We define five types of inference: logical inference, pragmatic inference, lexical inference, enunciative inference and discursive inference. Second, we explain our annotation framework which includes both inference detection and opinion mining. In short, this manual annotation determines whether or not a target contains an inference. If so, we then define inference type, polarity and topic. Using the results of this annotation, we observed several correlation relations which will be used to determine distinctive features for automatic inference classification in further research. We also demonstrate the results of three preliminary classification experiments.