Claudia Soria


2024

pdf bib
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024
Maite Melero | Sakriani Sakti | Claudia Soria
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

2022

pdf bib
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages
Maite Melero | Sakriani Sakti | Claudia Soria
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

2020

pdf bib
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)
Dorothee Beermann | Laurent Besacier | Sakriani Sakti | Claudia Soria
Proceedings of the 1st Joint Workshop on Spoken Language Technologies for Under-resourced languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL)

2018

pdf bib
The DLDP Survey on Digital Use and Usability of EU Regional and Minority Languages
Claudia Soria | Valeria Quochi | Irene Russo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

bib
Traitement Automatique des Langues, Volume 59, Numéro 3 : Traitement automatique des langues peu dotées [NLP for Under-Resourced Languages]
Delphine Bernhard | Claudia Soria
Traitement Automatique des Langues, Volume 59, Numéro 3 : Traitement automatique des langues peu dotées [NLP for Under-Resourced Languages]

pdf bib
Traitement automatique des langues peu dotées [NLP for Under-Resourced Languages]
Delphine Bernhard | Claudia Soria
Traitement Automatique des Langues, Volume 59, Numéro 3 : Traitement automatique des langues peu dotées [NLP for Under-Resourced Languages]

2016

pdf bib
LREC as a Graph: People and Resources in a Network
Riccardo Del Gratta | Francesca Frontini | Monica Monachini | Gabriella Pardelli | Irene Russo | Roberto Bartolini | Fahad Khan | Claudia Soria | Nicoletta Calzolari
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This proposal describes a new way to visualise resources in the LREMap, a community-built repository of language resource descriptions and uses. The LREMap is represented as a force-directed graph, where resources, papers and authors are nodes. The analysis of the visual representation of the underlying graph is used to study how the community gathers around LRs and how LRs are used in research.

pdf bib
Fostering digital representation of EU regional and minority languages: the Digital Language Diversity Project
Claudia Soria | Irene Russo | Valeria Quochi | Davyth Hicks | Antton Gurrutxaga | Anneli Sarhimaa | Matti Tuomisto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Poor digital representation of minority languages further prevents their usability on digital media and devices. The Digital Language Diversity Project, a three-year project funded under the Erasmus+ programme, aims at addressing the problem of low digital representation of EU regional and minority languages by giving their speakers the intellectual an practical skills to create, share, and reuse online digital content. Availability of digital content and technical support to use it are essential prerequisites for the development of language-based digital applications, which in turn can boost digital usage of these languages. In this paper we introduce the project, its aims, objectives and current activities for sustaining digital usability of minority languages through adult education.

2012

pdf bib
The LRE Map. Harmonising Community Descriptions of Resources
Nicoletta Calzolari | Riccardo Del Gratta | Gil Francopoulo | Joseph Mariani | Francesco Rubino | Irene Russo | Claudia Soria
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Accurate and reliable documentation of Language Resources is an undisputable need: documentation is the gateway to discovery of Language Resources, a necessary step towards promoting the data economy. Language resources that are not documented virtually do not exist: for this reason every initiative able to collect and harmonise metadata about resources represents a valuable opportunity for the NLP community. In this paper we describe the LRE Map, reporting statistics on resources associated with LREC2012 papers and providing comparisons with LREC2010 data. The LRE Map, jointly launched by FLaReNet and ELRA in conjunction with the LREC 2010 Conference, is an instrument for enhancing availability of information about resources, either new or already existing ones. It wants to reinforce and facilitate the use of standards in the community. The LRE Map web interface provides the possibility of searching according to a fixed set of metadata and to view the details of extracted resources. The LRE Map is continuing to collect bottom-up input about resources from authors of other conferences through standard submission process. This will help broadening the notion of “language resources” and attract to the field neighboring disciplines that so far have been only marginally involved by the standard notion of language resources.

pdf bib
The FLaReNet Strategic Language Resource Agenda
Claudia Soria | Núria Bel | Khalid Choukri | Joseph Mariani | Monica Monachini | Jan Odijk | Stelios Piperidis | Valeria Quochi | Nicoletta Calzolari
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

The FLaReNet Strategic Agenda highlights the most pressing needs for the sector of Language Resources and Technologies and presents a set of recommendations for its development and progress in Europe, as issued from a three-year consultation of the FLaReNet European project. The FLaReNet recommendations are organised around nine dimensions: a) documentation b) interoperability c) availability, sharing and distribution d) coverage, quality and adequacy e) sustainability f) recognition g) development h) infrastructure and i) international cooperation. As such, they cover a broad range of topics and activities, spanning over production and use of language resources, licensing, maintenance and preservation issues, infrastructures for language resources, resource identification and sharing, evaluation and validation, interoperability and policy issues. The intended recipients belong to a large set of players and stakeholders in Language Resources and Technology, ranging from individuals to research and education institutions, to policy-makers, funding agencies, SMEs and large companies, service and media providers. The main goal of these recommendations is to serve as an instrument to support stakeholders in planning for and addressing the urgencies of the Language Resources and Technologies of the future.

2010

pdf bib
Preparing the field for an Open Resource Infrastructure: the role of the FLaReNet Network of Excellence
Nicoletta Calzolari | Claudia Soria
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In order to overcome the fragmentation that affects the field of Language Resources and Technologies, an Open and Distributed Resource Infrastructure is the necessary step for building on each other achievements, integrating resources and technologies and avoiding dispersed or conflicting efforts. Since this endeavour represents a true cultural turnpoint in the LRs field, it needs a careful preparation, both in terms of acceptance by the community and thoughtful investigation of the various technical, organisational and practical aspects implied. To achieve this, we need to act as a community able to join forces on a set of shared priorities and we need to act at a worldwide level. FLaReNet ― Fostering Language Resources Network ― is a Thematic Network funded under the EU eContent program that aims at developing the needed common vision and fostering a European and International strategy for consolidating the sector, thus enhancing competitiveness at EU level and worldwide. In this paper we present the activities undertaken by FLaReNet in order to prepare and support the establishment of such an Infrastructure, which is becoming now a reality within the new MetaNet initiative.

pdf bib
The LREC Map of Language Resources and Technologies
Nicoletta Calzolari | Claudia Soria | Riccardo Del Gratta | Sara Goggi | Valeria Quochi | Irene Russo | Khalid Choukri | Joseph Mariani | Stelios Piperidis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper we present the LREC Map of Language Resources and Tools, an innovative feature introduced with this LREC. The purpose of the Map is to shed light on the vast amount of resources and tools that represent the background of the research presented at LREC, in the attempt to fill in a gap in the community knowledge about the resources and tools that are used or created worldwide. It also aims at a change of culture in the field, actively engaging each researcher in the documentation task about resources. The Map has been developed on the basis of the information provided by LREC authors during the submission of papers to the LREC 2010 conference and the LREC workshops, and contains information about almost 2000 resources. The paper illustrates the motivation behind this initiative, its main characteristics, its relevance and future impact in the field, the metadata used to describe the resources, and finally presents some of the most relevant findings.

pdf bib
Towards an ISO Standard for Dialogue Act Annotation
Harry Bunt | Jan Alexandersson | Jean Carletta | Jae-Woong Choe | Alex Chengyu Fang | Koiti Hasida | Kiyong Lee | Volha Petukhova | Andrei Popescu-Belis | Laurent Romary | Claudia Soria | David Traum
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes an ISO project which aims at developing a standard for annotating spoken and multimodal dialogue with semantic information concerning the communicative functions of utterances, the kind of semantic content they address, and their relations with what was said and done earlier in the dialogue. The project, ISO 24617-2 ""Semantic annotation framework, Part 2: Dialogue acts"", is currently at DIS stage. The proposed annotation schema distinguishes 9 orthogonal dimensions, allowing each functional segment in dialogue to have a function in each of these dimensions, thus accounting for the multifunctionality that utterances in dialogue often have. A number of core communicative functions is defined in the form of ISO data categories, available at http://semantic-annotation.uvt.nl/dialogue-acts/iso-datcats.pdf; they are divided into ""dimension-specific"" functions, which can be used only in a particular dimension, such as Turn Accept in the Turn Management dimension, and ""general-purpose"" functions, which can be used in any dimension, such as Inform and Request. An XML-based annotation language, ""DiAML"" is defined, with an abstract syntax, a semantics, and a concrete syntax.

pdf bib
An LMF-based Web Service for Accessing WordNet-type Semantic Lexicons
Bora Savas | Yoshihiko Hayashi | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes a Web service for accessing WordNet-type semantic lexicons. The central idea behind the service design is: given a query, the primary functionality of lexicon access is to present a partial lexicon by extracting the relevant part of the target lexicon. Based on this idea, we implemented the system as a RESTful Web service whose input query is specified by the access URI and whose output is presented in a standardized XML data format. LMF, an ISO standard for modeling lexicons, plays the most prominent role: the access URI pattern basically reflects the lexicon structure as defined by LMF; the access results are rendered based on Wordnet-LMF, which is a version of LMF XML-serialization. The Web service currently provides accesses to Princeton WordNet, Japanese WordNet, as well as the EDR Electronic Dictionary as a trial. To accommodate the EDR dictionary within the same framework, we modeled it also as a WordNet-type semantic lexicon. This paper thus argues possible alternatives to model innately bilingual/multilingual lexicons like EDR with LMF, and proposes possible revisions to Wordnet-LMF.

2009

pdf bib
The SILT and FlaReNet International Collaboration for Interoperability
Nancy Ide | James Pustejovsky | Nicoletta Calzolari | Claudia Soria
Proceedings of the Third Linguistic Annotation Workshop (LAW III)

pdf bib
Query Expansion using LMF-Compliant Lexical Resources
Takenobu Tokunaga | Dain Kaplan | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Virach Sornlertlamvanich | Thatsanee Charoenporn | Yingju Xia | Chu-Ren Huang | Shu-Kai Hsieh | Kiyoaki Shirai
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

pdf bib
The FLaReNet Thematic Network: A Global Forum for Cooperation
Nicoletta Calzolari | Claudia Soria
Proceedings of the 7th Workshop on Asian Language Resources (ALR7)

2008

pdf bib
Adapting International Standard for Asian Language Technologies
Takenobu Tokunaga | Dain Kaplan | Chu-Ren Huang | Shu-Kai Hsieh | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Kiyoaki Shirai | Virach Sornlertlamvanich | Thatsanee Charoenporn | YingJu Xia
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Corpus-based approaches and statistical approaches have been the main stream of natural language processing research for the past two decades. Language resources play a key role in such approaches, but there is an insufficient amount of language resources in many Asian languages. In this situation, standardisation of language resources would be of great help in developing resources in new languages. This paper presents the latest development efforts of our project which aims at creating a common standard for Asian language resources that is compatible with an international standard. In particular, the paper focuses on i) lexical specification and data categories relevant for building multilingual lexical resources for Asian languages; ii) a core upper-layer ontology needed for ensuring multilingual interoperability and iii) the evaluation platform used to test the entire architectural framework.

pdf bib
UFRA: a UIMA-based Approach to Federated Language Resource Architecture
Riccardo Del Gratta | Roberto Bartolini | Tommaso Caselli | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper we address the issue of developing an interoperable infrastructure for language resources and technologies. In our approach, called UFRA, we extend the Federate Database Architecture System adding typical functionalities caming from UIMA. In this way, we capitalize the advantages of a federated architecture, such as autonomy, heterogeneity and distribution of components, monitored by a central authority responsible for checking both the integration of components and user rights on performing different tasks. We use the UIMA approach to manage and define one common front-end, enabling users and clients to query, retrieve and use language resources and technologies. The purpose of this paper is to show how UIMA leads from a Federated Database Architecture to a Federated Resource Architecture, adding to a registry of available components both static resources such as lexicons and corpora and dynamic ones such as tools and general purpose language technologies. At the end of the paper, we present a case-study that adopts this framework to integrate the SIMPLE lexicon and TIMEML annotation guidelines to tag natural language texts.

pdf bib
Ontologizing Lexicon Access Functions based on an LMF-based Lexicon Taxonomy
Yoshihiko Hayashi | Chiharu Narawa | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper discusses ontologization of lexicon access functions in the context of a service-oriented language infrastructure, such as the Language Grid. In such a language infrastructure, an access function to a lexical resource, embodied as an atomic Web service, plays a crucially important role in composing a composite Web service tailored to a user’s specific requirement. To facilitate the composition process involving service discovery, planning and invocation, the language infrastructure should be ontology-based; hence the ontologization of a range of lexicon functions is highly required. In a service-oriented environment, lexical resources however can be classified from a service-oriented perspective rather than from a lexicographically motivated standard. Hence to address the issue of interoperability, the taxonomy for lexical resources should be ground to principled and shared lexicon ontology. To do this, we have ontologized the standardized lexicon modeling framework LMF, and utilized it as a foundation to stipulate the service-oriented lexicon taxonomy and the corresponding ontology for lexicon access functions. This paper also examines a possible solution to fill the gap between the ontological descriptions and the actual Web service API by adopting a W3C recommendation SAWSDL, with which Web service descriptions can be linked with the domain ontology.

2006

pdf bib
Moving to dynamic computational lexicons with LeXFlow
Claudia Soria | Maurizio Tesconi | Francesca Bertagna | Nicoletta Calzolari | Andrea Marchetti | Monica Monachini
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we present LeXFlow, a web application framework where lexicons already expressed in standardised format semi-automatically interact by reciprocally enriching themselves. LeXFlow is intended for, on the one hand, paving the way to the development of dynamic multi-source lexicons; and on the other, for fostering the adoption of standards. Borrowing from techniques used in the domain of document workflows, we model the activity of lexicon management as a particular case of workflow instance, where lexical entries move across agents and become dynamically updated. To this end, we have designed a lexical flow (LF) corresponding to the scenario where an entry of a lexicon A becomes enriched via basically two steps. First, by virtue of being mapped onto a corresponding entry belonging to a lexicon B, the entry(LA) inherits the semantic relations available in lexicon B. Second, by resorting to an automatic application that acquires information about semantic relations from corpora, the relations acquired are integrated into the entry and proposed to the human encoder. As a result of the lexical flow, in addition, for each starting lexical entry(LA) mapped onto a corresponding entry(LB) the flow produces a new entry representing the merging of the original two.

pdf bib
Language Resources Production Models: the Case of the INTERA Multilingual Corpus and Terminology
Maria Gavrilidou | Penny Labropoulou | Stelios Piperidis | Voula Giouli | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Khalid Choukri
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper reports on the multilingual Language Resources (MLRs), i.e. parallel corpora and terminological lexicons for less widely digitally available languages, that have been developed in the INTERA project and the methodology adopted for their production. Special emphasis is given to the reality factors that have influenced the MLRs development approach and their final constitution. Building on the experience gained in the project, a production model has been elaborated, suggesting ways and techniques that can be exploited in order to improve LRs production taking into account realistic issues.

pdf bib
Lexical Markup Framework (LMF)
Gil Francopoulo | Monte George | Nicoletta Calzolari | Monica Monachini | Nuria Bel | Mandy Pet | Claudia Soria
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Optimizing the production, maintenance and extension of lexical resources is one the crucial aspects impacting Natural Language Processing (NLP). A second aspect involves optimizing the process leading to their integration in applications. With this respect, we believe that the production of a consensual specification on lexicons can be a useful aid for the various NLP actors. Within ISO, the purpose of LMF is to define a standard for lexicons. LMF is a model that provides a common standardized framework for the construction of NLP lexicons. The goals of LMF are to provide a common model for the creation and use of lexical resources, to manage the exchange of data between and among these resources, and to enable the merging of large number of individual electronic resources to form extensive global electronic resources. In this paper, we describe the work in progress within the sub-group ISO-TC37/SC4/WG4. Various experts from a lot of countries have been consulted in order to take into account best practices in a lot of languages for (we hope) all kinds of NLP lexicons.

pdf bib
Next Generation Language Resources using Grid
Federico Calzolari | Eva Sassolini | Manuela Sassi | Sebastiana Cucurullo | Eugenio Picchi | Francesca Bertagna | Alessandro Enea | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper presents a case study concerning the challenges and requirements posed by next generation language resources, realized as an overall model of open, distributed and collaborative language infrastructure. If a sort of “new paradigm” for language resource sharing is required, we think that the emerging and still evolving technology connected to Grid computing is a very interesting and suitable one for a concrete realization of this vision. Given the current limitations of Grid computing, it is very important to test the new environment on basic language analysis tools, in order to get the feeling of what are the potentialities and possible limitations connected to its use in NLP. For this reason, we have done some experiments on a module of the Linguistic Miner, i.e. the extraction of linguistic patterns from restricted domain corpora. The Grid environment has produced the expected results (reduction of the processing time, huge storage capacity, data redundancy) without any additional cost for the final user.

pdf bib
Infrastructure for Standardization of Asian Language Resources
Takenobu Tokunaga | Virach Sornlertlamvanich | Thatsanee Charoenporn | Nicoletta Calzolari | Monica Monachini | Claudia Soria | Chu-Ren Huang | YingJu Xia | Hao Yu | Laurent Prevot | Kiyoaki Shirai
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

pdf bib
LeXFlow: A System for Cross-Fertilization of Computational Lexicons
Maurizio Tesconi | Andrea Marchetti | Francesca Bertagna | Monica Monachini | Claudia Soria | Nicoletta Calzolari
Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions

pdf bib
Lexical Markup Framework (LMF) for NLP Multilingual Resources
Gil Francopoulo | Nuria Bel | Monte George | Nicoletta Calzolari | Monica Monachini | Mandy Pet | Claudia Soria
Proceedings of the Workshop on Multilingual Language Resources and Interoperability

pdf bib
Towards Agent-based Cross-Lingual Interoperability of Distributed Lexical Resources
Claudia Soria | Maurizio Tesconi | Andrea Marchetti | Francesca Bertagna | Monica Monachini | Chu-Ren Huang | Nicoletta Calzolari
Proceedings of the Workshop on Multilingual Language Resources and Interoperability

2004

pdf bib
Semantic Mark-up of Italian Legal Texts Through NLP-based Techniques
Roberto Bartolini | Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Advanced Tools for the Study of Natural Interactivity
Claudia Soria | Niels Ole Bernsen | Niels Cadée | Jean Carletta | Laila Dybkjær | Stefan Evert | Ulrich Heid | Amy Isard | Mykola Kolodnytsky | Christoph Lauer | Wolfgang Lezius | Lucas P.J.J. Noldus | Vito Pirrelli | Norbert Reithinger | Andreas Vögele
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
ADAM: The SI-TAL Corpus of Annotated Dialogues
Roldano Cattoni | Morena Danieli | Vanessa Sandrini | Claudia Soria
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
The Italian Lexical Sample Task
Francesca Bertagna | Claudia Soria | Nicoletta Calzolari
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

2000

pdf bib
ADAM- An Architecture for xml-based Dialogue Annotation on Multiple levels
Claudia Soria | Roldano Cattoni | Morena Danieli
1st SIGdial Workshop on Discourse and Dialogue

pdf bib
Where Opposites Meet. A Syntactic Meta-scheme for Corpus Annotation and Parsing Evaluation
Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib
A recognition-based meta-scheme for dialogue acts annotation
Claudia Soria | Vito Pirrelli
Towards Standards and Tools for Discourse Tagging

pdf bib
FAME: a Functional Annotation Meta-scheme for multi-modal and multi-lingual Parsing Evaluation
Alessandro Lenci | Simonetta Montemagni | Vito Pirrelli | Claudia Soria
Computer Mediated Language Assessment and Evaluation in Natural Language Processing

1998

pdf bib
Lexical marking of discourse relations - some experimental findings
Claudia Soria | Giacomo Ferrari
Discourse Relations and Discourse Markers

Search