Francesco Cutugno


pdf bib
Modelling Filled Particles and Prolongation Using End-to-end Automatic Speech Recognition Systems: A Quantitative and Qualitative Analysis.
Vincenzo Norman Vitale | Loredana Schettino | Francesco Cutugno
Proceedings of the 10th Italian Conference on Computational Linguistics (CLiC-it 2024)

State-of-the-art automatic speech recognition systems based on End-to-End models (E2E-ASRs) achieve remarkable perfor mances. However, phenomena that characterize spoken language such as fillers (eeh ehm) or segmental prolongations (theee) are still mostly considered as disrupting objects that should not be included to obtain optimal transcriptions, despite their acknowledged regularity and communicative value. A recent study showed that two types of pre-trained systems with the same Conformer-based encoding architecture but different decoders – a Connectionist Temporal Classification (CTC) decoder and a Transducer decoder – tend to model some speech features that are functional for the identification of filled pauses and prolongation in speech. This work builds upon these findings by investigating which of the two systems is better at fillers and prolongations detection tasks and by conducting an error analysis to deepen our understanding of how these systems work.


pdf bib
VOLIP: a corpus of spoken Italian and a virtuous example of reuse of linguistic resources
Iolanda Alfano | Francesco Cutugno | Aurelio De Rosa | Claudio Iacobini | Renata Savy | Miriam Voghera
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

The corpus VoLIP (The Voice of LIP) is an Italian speech resource which associates the audio signals to the orthographic transcriptions of the LIP Corpus. The LIP Corpus was designed to represent diaphasic, diatopic and diamesic variation. The Corpus was collected in the early ‘90s to compile a frequency lexicon of spoken Italian and its size was tailored to produce a reliable frequency lexicon for the first 3,000 lemmas. Therefore, it consists of about 500,000 word tokens for 60 hours of recording. The speech materials belong to five different text registers and they were collected in four different cities. Thanks to a modern technological approach VoLIP web service allows users to search the LIP corpus using IMDI metadata, lexical or morpho-syntactic entry keys, receiving as result the audio portions aligned to the corresponding required entry. The VoLIP corpus is freely available at the URL


pdf bib
W-PhAMT: A web tool for phonetic multilevel timeline visualization
Francesco Cutugno | Vincenza Anna Leano | Antonio Origlia
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents a web platform with an its own graphic environment to visualize and filter multilevel phonetic annotations. The tool accepts as input Annotation Graph XML and Praat TextGrids files and converts these files into a specific XML format. XML output is used to browse data by means of a web tool using a visualization metaphor, namely a timeline. A timeline is a graphical representation of a period of time, on which relevant events are marked. Events are usually distributed over many layers in a geometrical metaphor represented by segments and points spatially distributed with reference to a temporal axis. The tool shows all the annotations included in the uploaded dataset, allowing the listening of the entire file or of its parts. Filtering is allowed on annotation labels by means of string pattern matching. The web service includes cloud services to share data with other users. The tool is available at


pdf bib
New Features in Spoken Language Search Hawk (SpLaSH): Query Language and Query Sequence
Sara Romano | Francesco Cutugno
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this work we present further development of the SpLaSH (Spoken Language Search Hawk) project. SpLaSH implements a data model for annotated speech corpora integrated with textual markup (i.e. POS tagging, syntax, pragmatics) including a toolkit used to perform complex queries across speech and text labels. The integration of time aligned annotations (TMA), represented making use of Annotation Graphs, with text aligned ones (TXA), stored in generic XML files, are provided by a data structure, the Connector Frame, acting as table-look-up linking temporal data to words in the text. SpLaSH imposes a very limited number of constraints to the data model design, allowing the integration of annotations developed separately within the same dataset and without any relative dependency. It also provides a GUI allowing three types of queries: simple query on TXA or TMA structures, sequence query on TMA structure and cross query on both TXA and TMA integrated structures. In this work new SpLaSH features will be presented: SpLaSH Query Language (SpLaSHQL) and Query Sequence.


pdf bib
Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).
Renata Savy | Francesco Cutugno | Claudia Crocco
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present an application of AGTK to a corpus of spoken Italian annotated at many different linguistic levels. The work consists of two parts: a) the presentation of AG-SpIt, a toolkit devoted to corpus data management that we developed according to AGTK proposals; b) the presentation of corpus’ structure together with some examples and results of cross-level linguistic analyses obtained querying the database (SpIt-MDb). As this work is still an ongoing investigation, results must be considered preliminary, as a ‘demo’ illustrating the potentiality of the tool and the advantages it introduces to validate linguistic theories and annotation systems. Currently, SpIt-MDb is a linguistic resource under development; it represents one of the first attempts to create an Italian corpus labelled at various linguistic levels (from acoustic/sub-phonetic, to textual/pragmatic ones) which can be queried in the interrelations among levels.

pdf bib
An observatory on Spoken Italian linguistic resources and descriptive standards.
Miriam Voghera | Francesco Cutugno
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

We present the national project “Parlare italiano: osservatorio degli usi linguistici”, funded by the Italian Ministry of Education, Scientific Research and University (PRIN 2004). Ten research groups participate to the project from various Italian universities. The project has four fundamental objectives: 1) to plan a national website that collects the most recent theoretical and applied results on spoken language; 2) to create an observatory of the linguistic usages of the Italian spoken language; 3) to delineate and implement standard and formalized methods and procedures for the study of spoken language; 4) to develop a training program for young researchers. The website will be accessible starting from November 2006.