Manuela Sassi


2012

pdf bib
From medical language processing to BioNLP domain
Gabriella Pardelli | Manuela Sassi | Sara Goggi | Stefania Biagioni
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents the results of a terminological work on a reference corpus in the domain of Biomedicine. In particular, the research tends to analyse the use of certain terms in Biomedicine in order to verify their change over the time with the aim of retrieving from the net the very essence of documentation. The terminological sample contains words used in BioNLP and biomedicine and identifies which terms are passing from scientific publications to the daily press and which are rather reserved to scientific production. The final scope of this work is to determine how scientific dissemination to an ever larger part of the society enables a public of common citizens to approach communication on biomedical research and development; and its main source is a reference corpus made up of three main repositories from which information related to BioNLP and Biomedicine is extracted. The paper is divided in three sections: 1) an introduction dedicated to data extracted from scientific documentation; 2) the second section devoted to methodology and data description; 3) the third part containing a statistical representation of terms extracted from the archive: indexes and concordances allow to reflect on the use of certain terms in this field and give possible keys for having access to the extraction of knowledge in the digital era.

2010

pdf bib
A Digital Archive of Research Papers in Computer Science
Manuela Sassi | Gabriella Pardelli | Stefania Biagioni | Carlo Carlesi | Sara Goggi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the results of a terminological work conducted by the authors on a Digital Archives Net of the Italian National Research Council (CNR) in the field of Computer Science. In particular, the research tends to analyse the use of certain terms in Computer Science in order to verify their change over the time with the aim of retrieving from the net the very essence of documentation. Its main source is a reference corpus made up of 13,500 documents which collects the scientific productions of CNR. This study is divided in three sections: 1) an introductory one dedicated to the data extracted from the scientific documentation; 2) the second section is devoted to the description of the contents managed by the PUMA system; 3) the third part contains a statistical representation of terms extracted from archive: some comparison tables between the occurrences of the most used terms in the scientific documentation will be created and diagrams with percentages about the most frequently used terms will be displayed too. Indexes and concordances will allow to reflect on the use of certain terms in this field and give possible keys for having access to the extraction of knowledge.

2006

pdf bib
Natural Language Processing: A Terminological and Statistical Approach
Gabriella Pardelli | Manuela Sassi | Sara Goggi | Paola Orsolini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The aim of this article is to provide a statistical representation of significant terms used in the field of Natural Language Processing from the 1960s till nowadays, in order to draft a survey on the most significant research trends in that period. By retrieving these keywords it should be possible to highlight the ebb and flow of some thematic topics. The NLP terminological sample derives from a database created for this purpose using the DBT software (Textual Data Base, ILC patent).

pdf bib
Next Generation Language Resources using Grid
Federico Calzolari | Eva Sassolini | Manuela Sassi | Sebastiana Cucurullo | Eugenio Picchi | Francesca Bertagna | Alessandro Enea | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents a case study concerning the challenges and requirements posed by next generation language resources, realized as an overall model of open, distributed and collaborative language infrastructure. If a sort of “new paradigm” for language resource sharing is required, we think that the emerging and still evolving technology connected to Grid computing is a very interesting and suitable one for a concrete realization of this vision. Given the current limitations of Grid computing, it is very important to test the new environment on basic language analysis tools, in order to get the feeling of what are the potentialities and possible limitations connected to its use in NLP. For this reason, we have done some experiments on a module of the Linguistic Miner, i.e. the extraction of linguistic patterns from restricted domain corpora. The Grid environment has produced the expected results (reduction of the processing time, huge storage capacity, data redundancy) without any additional cost for the final user.

2004

pdf bib
Computational Lexicography and Carlo Emilio Gadda, Principe dell’Analisi e Duca della Buona Cognizione
Maria Luigia Ceccotti | Manuela Sassi
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Linguistic Miner: An Italian Linguistic Knowledge System
Eugenio Picchi | Maria Luigia Ceccotti | Sebastiana Cucurullo | Manuela Sassi | Eva Sassolini
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
From Weaver to the ALPAC Report
Gabriella Pardelli | Manuela Sassi | Sara Goggi
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)