Ismail El Maarouf

Also published as: Ismaïl El Maarouf, Ismail El Maarouf

2022

This paper describes the FinTOC-2022 Shared Task on the structure extraction from financial documents, its participants results and their findings. This shared task was organized as part of The 4th Financial Narrative Processing Workshop (FNP 2022), held jointly at The 13th Edition of the Language Resources and Evaluation Conference (LREC 2022), Marseille, France (El-Haj et al., 2022). This shared task aimed to stimulate research in systems for extracting table-of-contents (TOC) from investment documents (such as financial prospectuses) by detecting the document titles and organizing them hierarchically into a TOC. For the forth edition of this shared task, three subtasks were presented to the participants: one with English documents, one with French documents and the other one with Spanish documents. This year, we proposed a different and revised dataset for English and French compared to the previous editions of FinTOC and a new dataset for Spanish documents was added. The task attracted 6 submissions for each language from 4 teams, and the most successful methods make use of textual, structural and visual features extracted from the documents and propose classification models for detecting titles and TOCs for all of the subtasks.

pdf bib abs

FinSim4-ESG Shared Task: Learning Semantic Similarities for the Financial Domain. Extended edition to ESG insights
Juyeon Kang | Ismail El Maarouf
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

This paper describes FinSim4-ESG 1 shared task organized in the 4th FinNLP workshopwhich is held in conjunction with the IJCAI-ECAI-2022 confer- enceThis year, the FinSim4 is extended to the Environment, Social and Government (ESG) insights and proposes two subtasks, one for ESG Taxonomy Enrichment and the other for Sustainable Sentence Prediction. Among the 28 teams registered to the shared task, a total of 8 teams submitted their systems results and 6 teams also submitted a paper to describe their method. The winner of each subtask shows good performance results of 0.85% and 0.95% in terms of accuracy, respectively.

2021

pdf bib

FinSim-3: The 3rd Shared Task on Learning Semantic Similarities for the Financial Domain
Juyeon Kang | Ismail El Maarouf | Sandra Bellato | Mei Gan
Proceedings of the Third Workshop on Financial Technology and Natural Language Processing

pdf bib

2020

pdf bib abs

The Financial Document Structure Extraction Shared task (FinToc 2020)
Najah-Imane Bentabet | Rémi Juge | Ismail El Maarouf | Virginie Mouilleron | Dialekti Valsamou-Stanislawski | Mahmoud El-Haj
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

This paper presents the FinTOC-2020 Shared Task on structure extraction from financial documents, its participants results and their findings. This shared task was organized as part of The 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation (FNP-FNS 2020), held at The 28th International Conference on Computational Linguistics (COLING’2020). This shared task aimed to stimulate research in systems for extracting table-of-contents (TOC) from investment documents (such as financial prospectuses) by detecting the document titles and organizing them hierarchically into a TOC. For the second edition of this shared task, two subtasks were presented to the participants: one with English documents and the other one with French documents.

pdf bib

The FinSim 2020 Shared Task: Learning Semantic Representations for the Financial Domain
Ismail El Maarouf | Youness Mansar | Virginie Mouilleron | Dialekti Valsamou-Stanislawski
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing

2015

pdf bib

Barbecued Opakapaka: Using Semantic Preferences for Ontology Population
Ismail El Maarouf | Georgiana Marsic | Constantin Orăsan
Proceedings of the International Conference Recent Advances in Natural Language Processing

pdf bib

The GuanXi network: a new multilingual LLOD for Language Learning applications
Ismail El Maarouf | Hatem Mousselly-Sergieh | Eugene Alferov | Haofen Wang | Zhijia Fang | Doug Cooper
Proceedings of the Second Workshop on Natural Language Processing and Linked Open Data

pdf bib

2014

pdf bib abs

Disambiguating Verbs by Collocation: Corpus Lexicography meets Natural Language Processing
Ismail El Maarouf | Jane Bradbury | Vít Baisa | Patrick Hanks
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper reports the results of Natural Language Processing (NLP) experiments in semantic parsing, based on a new semantic resource, the Pattern Dictionary of English Verbs (PDEV) (Hanks, 2013). This work is set in the DVC (Disambiguating Verbs by Collocation) project , a project in Corpus Lexicography aimed at expanding PDEV to a large scale. This project springs from a long-term collaboration of lexicographers with computer scientists which has given rise to the design and maintenance of specific, adapted, and user-friendly editing and exploration tools. Particular attention is drawn on the use of NLP deep semantic methods to help in data processing. Possible contributions of NLP include pattern disambiguation, the focus of this article. The present article explains how PDEV differs from other lexical resources and describes its structure in detail. It also presents new classification experiments on a subset of 25 verbs. The SVM model obtained a micro-average F1 score of 0.81.

pdf bib

UoW: NLP techniques developed at the University of Wolverhampton for Semantic Similarity and Textual Entailment
Rohit Gupta | Hanna Béchara | Ismail El Maarouf | Constantin Orăsan
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)

2013

pdf bib

An empirical classification of verbs based on Semantic Types: the case of the ‘poison’ verbs.
Jane Bradbury | Ismail El Maarouf
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

pdf bib

Automatic classification of semantic patterns from the Pattern Dictionary of English Verbs
Ismaïl El Maarouf | Vít Baisa
Proceedings of the Joint Symposium on Semantic Processing. Textual Inference and Structures in Corpora

2012

pdf bib

Parenthetical Classification for Information Extraction
Ismail El Maarouf | Jeanne Villaneau
Proceedings of COLING 2012: Posters

pdf bib abs

A French Fairy Tale Corpus syntactically and semantically annotated
Ismaïl El Maarouf | Jeanne Villaneau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Fairy tales, folktales and more generally children stories have lately attracted the Natural Language Processing (NLP) community. As such, very few corpora exist and linguistic resources are lacking. The work presented in this paper aims at filling this gap by presenting a syntactically and semantically annotated corpus. It focuses on the linguistic analysis of a Fairy Tales Corpus, and provides the description of the syntactic and semantic resources developed for Information Extraction. Resources include syntactic dependency relation annotation for 120 verbs; referential annotation, which is concerned with annotating each anaphoric occurrence and Proper Name with the most specific noun in the text; ontology matching for a substantial part of the nouns in the corpus; semantic role labelling for 41 verbs using the FrameNet database. The article also sums up previous analyses of this corpus and indicates possible uses of this corpus for the NLP community.

2011

pdf bib abs

Extraction de patrons sémantiques appliquée à la classification d’Entités Nommées (Extraction of semantic patterns applied to the classification of named entities)
Ismaïl El Maarouf | Jeanne Villaneau | Sophie Rosset
Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

La variabilité des corpus constitue un problème majeur pour les systèmes de reconnaissance d’entités nommées. L’une des pistes possibles pour y remédier est l’utilisation d’approches linguistiques pour les adapter à de nouveaux contextes : la construction de patrons sémantiques peut permettre de désambiguïser les entités nommées en structurant leur environnement syntaxico-sémantique. Cet article présente une première réalisation sur un corpus de presse d’un système de correction. Après une étape de segmentation sur des critères discursifs de surface, le système extrait et pondère les patrons liés à une classe d’entité nommée fournie par un analyseur. Malgré des modèles encore relativement élémentaires, les résultats obtenus sont encourageants et montrent la nécessité d’un traitement plus approfondi de la classe Organisation.