Nicolo’ Tamagnone
Also published as: Nicolò Tamagnone
2022
HumSet: Dataset of Multilingual Information Extraction and Classification for Humanitarian Crises Response
Selim Fekih
|
Nicolo’ Tamagnone
|
Benjamin Minixhofer
|
Ranjan Shrestha
|
Ximena Contla
|
Ewan Oglethorpe
|
Navid Rekabsaz
Findings of the Association for Computational Linguistics: EMNLP 2022
Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data – a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HUMSET provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HUMSET also provides novel and challenging entry extraction and multi-label entry classification tasks. In this paper, we take a first step towards approaching these tasks and conduct a set of experiments on Pre-trained Language Models (PLM) to establish strong baselines for future research in this domain. The dataset is available at https://blog.thedeep.io/humset/.
2020
Top-Rank-Focused Adaptive Vote Collection for the Evaluation of Domain-Specific Semantic Models
Pierangelo Lombardo
|
Alessio Boiardi
|
Luca Colombo
|
Angelo Schiavone
|
Nicolò Tamagnone
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
The growth of domain-specific applications of semantic models, boosted by the recent achievements of unsupervised embedding learning algorithms, demands domain-specific evaluation datasets. In many cases, content-based recommenders being a prime example, these models are required to rank words or texts according to their semantic relatedness to a given concept, with particular focus on top ranks. In this work, we give a threefold contribution to address these requirements: (i) we define a protocol for the construction, based on adaptive pairwise comparisons, of a relatedness-based evaluation dataset tailored on the available resources and optimized to be particularly accurate in top-rank evaluation; (ii) we define appropriate metrics, extensions of well-known ranking correlation coefficients, to evaluate a semantic model via the aforementioned dataset by taking into account the greater significance of top ranks. Finally, (iii) we define a stochastic transitivity model to simulate semantic-driven pairwise comparisons, which confirms the effectiveness of the proposed dataset construction protocol.
Search
Fix data
Co-authors
- Alessio Boiardi 1
- Luca Colombo 1
- Ximena Contla 1
- Selim Fekih 1
- Pierangelo Lombardo 1
- show all...