Renata Savy
2014
VOLIP: a corpus of spoken Italian and a virtuous example of reuse of linguistic resources
Iolanda Alfano
|
Francesco Cutugno
|
Aurelio De Rosa
|
Claudio Iacobini
|
Renata Savy
|
Miriam Voghera
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The corpus VoLIP (The Voice of LIP) is an Italian speech resource which associates the audio signals to the orthographic transcriptions of the LIP Corpus. The LIP Corpus was designed to represent diaphasic, diatopic and diamesic variation. The Corpus was collected in the early 90s to compile a frequency lexicon of spoken Italian and its size was tailored to produce a reliable frequency lexicon for the first 3,000 lemmas. Therefore, it consists of about 500,000 word tokens for 60 hours of recording. The speech materials belong to five different text registers and they were collected in four different cities. Thanks to a modern technological approach VoLIP web service allows users to search the LIP corpus using IMDI metadata, lexical or morpho-syntactic entry keys, receiving as result the audio portions aligned to the corresponding required entry. The VoLIP corpus is freely available at the URL http://www.parlaritaliano.it.
2010
Pr.A.Ti.D: A Coding Scheme for Pragmatic Annotation of Dialogues.
Renata Savy
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Our purpose is to propose and discuss the latest version of an integrated method for dialogue analysis, annotation and evaluation., using a set of different pragmatic parameters. The annotation scheme Pr.A.Ti.D was built up on task-oriented dialogues. Dialogues are part of the CLIPS corpus of spoken Italian, which consists of spoken material stratified as regard as the diatopic variation. A description of the multilevel annotation scheme is provided, discussing some problems of its design and formalisation in a DTD for Xml mark-up. A further goal was to extend the use of Pr.A.Ti.D to other typologies of task-oriented texts and to verify the necessity and the amount of possible changes to the scheme, in order to make it more general and less oriented to specific purposes: a test on map task dialogues and consequent modifications of the scheme are presented. The application of the scheme allowed us to extract pragmatic indexes, typical of each kind of text types, and to perform both a qualitative and quantitative analysis of texts. Finally, in a linguistic perspective, a comparative analyses of conversational and communicative styles in dialogues performed by speakers belonging to different linguistic cultures and areas is proposed.
2006
Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).
Renata Savy
|
Francesco Cutugno
|
Claudia Crocco
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
In this paper we present an application of AGTK to a corpus of spoken Italian annotated at many different linguistic levels. The work consists of two parts: a) the presentation of AG-SpIt, a toolkit devoted to corpus data management that we developed according to AGTK proposals; b) the presentation of corpus structure together with some examples and results of cross-level linguistic analyses obtained querying the database (SpIt-MDb). As this work is still an ongoing investigation, results must be considered preliminary, as a demo illustrating the potentiality of the tool and the advantages it introduces to validate linguistic theories and annotation systems. Currently, SpIt-MDb is a linguistic resource under development; it represents one of the first attempts to create an Italian corpus labelled at various linguistic levels (from acoustic/sub-phonetic, to textual/pragmatic ones) which can be queried in the interrelations among levels.
Search
Co-authors
- Francesco Cutugno 2
- Claudia Crocco 1
- Iolanda Alfano 1
- Aurelio De Rosa 1
- Claudio Iacobini 1
- show all...
Venues
- lrec3