Franco Alberto Cardillo


2024

pdf bib
Italian Word Embeddings for the Medical Domain
Franco Alberto Cardillo | Franca Debole
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Neural word embeddings have proven valuable in the development of medical applications. However, for the Italian language, there are no publicly available corpora, embeddings, or evaluation resources tailored to this domain. In this paper, we introduce an Italian corpus for the medical domain, that includes texts from Wikipedia, medical journals, drug leaflets, and specialized websites. Using this corpus, we generate neural word embeddings from scratch. These embeddings are then evaluated using standard evaluation resources, that we translated into Italian exploiting the concept graph in the UMLS Metathesaurus. Despite the relatively small size of the corpus, our experimental results indicate that the new embeddings correlate well with human judgments regarding the similarity and the relatedness of medical concepts. Moreover, these medical-specific embeddings outperform a baseline model trained on the full Wikipedia corpus, which includes the medical pages we used. We believe that our embeddings and the newly introduced textual resources will foster further advancements in the field of Italian medical Natural Language Processing.

2019

pdf bib
Linguistically-Driven Strategy for Concept Prerequisites Learning on Italian
Alessio Miaschi | Chiara Alzetta | Franco Alberto Cardillo | Felice Dell’Orletta
Proceedings of the Fourteenth Workshop on Innovative Use of NLP for Building Educational Applications

We present a new concept prerequisite learning method for Learning Object (LO) ordering that exploits only linguistic features extracted from textual educational resources. The method was tested in a cross- and in- domain scenario both for Italian and English. Additionally, we performed experiments based on a incremental training strategy to study the impact of the training set size on the classifier performances. The paper also introduces ITA-PREREQ, to the best of our knowledge the first Italian dataset annotated with prerequisite relations between pairs of educational concepts, and describe the automatic strategy devised to build it.