Vladimir Popescu

2018

We describe the European Language Resources Infrastructure project, whose main aim is the provision of an infrastructure to help collect, prepare and share language resources that can in turn improve translation services in Europe.

pdf bib

New directions in ELRA activities
Valérie Mapelli | Victoria Arranz | Hélène Mazo | Pawel Kamocki | Vladimir Popescu
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib abs

Language Resource Citation: the ISLRN Dissemination and Further Developments
Valérie Mapelli | Vladimir Popescu | Lin Liu | Khalid Choukri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This article presents the latest dissemination activities and technical developments that were carried out for the International Standard Language Resource Number (ISLRN) service. It also recalls the main principle and submission process for providers to obtain their 13-digit ISLRN identifier. Up to March 2016, 2100 Language Resources were allocated an ISLRN number, not only ELRA’s and LDC’s catalogued Language Resources, but also the ones from other important organisations like the Joint Research Centre (JRC) and the Resource Management Agency (RMA) who expressed their strong support to this initiative. In the research field, not only assigning a unique identification number is important, but also referring to a Language Resource as an object per se (like publications) has now become an obvious requirement. The ISLRN could also become an important parameter to be considered to compute a Language Resource Impact Factor (LRIF) in order to recognize the merits of the producers of Language Resources. Integrating the ISLRN number into a LR-oriented bibliographical reference is thus part of the objective. The idea is to make use of a BibTeX entry that would take into account Language Resources items, including ISLRN.The ISLRN being a requested field within the LREC 2016 submission, we expect that several other LRs will be allocated an ISLRN number by the conference date. With this expansion, this number aims to be a spreadly-used LR citation instrument within works referring to LRs.

pdf bib abs

Enhancing Cross-border EU E-commerce through Machine Translation: Needed Language Resources, Challenges and Opportunities
Meritxell Fernández Barrera | Vladimir Popescu | Antonio Toral | Federico Gaspari | Khalid Choukri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper discusses the role that statistical machine translation (SMT) can play in the development of cross-border EU e-commerce,by highlighting extant obstacles and identifying relevant technologies to overcome them. In this sense, it firstly proposes a typology of e-commerce static and dynamic textual genres and it identifies those that may be more successfully targeted by SMT. The specific challenges concerning the automatic translation of user-generated content are discussed in detail. Secondly, the paper highlights the risk of data sparsity inherent to e-commerce and it explores the state-of-the-art strategies to achieve domain adequacy via adaptation. Thirdly, it proposes a robust workflow for the development of SMT systems adapted to the e-commerce domain by relying on inexpensive methods. Given the scarcity of user-generated language corpora for most language pairs, the paper proposes to obtain monolingual target-language data to train language models and aligned parallel corpora to tune and evaluate MT systems by means of crowdsourcing.

pdf bib abs

Evaluation in Discourse: a Corpus-Based Study
Farah Benamara | Nicholas Asher | Yvette Yannick Mathieu | Vladimir Popescu | Baptiste Chardon
Dialogue Discourse Volume 7

This paper describes the CASOAR corpus, the first manually annotated corpus that explores the impact of discourse structure on sentiment analysis with a study of movie reviews in French and in English as well as letters to the editor in French. While annotating opinions at the expression, the sentence or the document level is a well-established task and relatively straightforward, discourse annotation remains difficult, especially for non-experts. Therefore, combining both annotations poses several methodological problems that we address here. We propose a multi-layered annotation scheme that includes: the complete discourse structure according to the Segmented Discourse Representation Theory, the opinion orientation of elementary discourse units and opinion expressions, and their associated features. We detail each layer, explore the interactions between them and discuss our results. In particular, we examine the correlation between discourse and semantic category of opinion expressions, the impact of discourse relations on both subjectivity and polarity analysis and the impact of discourse on the determination of the overall opinion of a document. Our results demonstrate that discourse is an important cue for sentiment analysis, at least for the corpus genres we have studied.

pdf bib abs

ELRA Activities and Services
Khalid Choukri | Valérie Mapelli | Hélène Mazo | Vladimir Popescu
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

After celebrating its 20th anniversary in 2015, ELRA is carrying on its strong involvement in the HLT field. To share ELRA’s expertise of those 21 past years, this article begins with a presentation of ELRA’s strategic Data and LR Management Plan for a wide use by the language communities. Then, we further report on ELRA’s activities and services provided since LREC 2014. When looking at the cataloguing and licensing activities, we can see that ELRA has been active at making the Meta-Share repository move toward new developments steps, supporting Europe to obtain accurate LRs within the Connecting Europe Facility programme, promoting the use of LR citation, creating the ELRA License Wizard web portal. The article further elaborates on the recent LR production activities of various written, speech and video resources, commissioned by public and private customers. In parallel, ELDA has also worked on several EU-funded projects centred on strategic issues related to the European Digital Single Market. The last part gives an overview of the latest dissemination activities, with a special focus on the celebration of its 20th anniversary organised in Dubrovnik (Croatia) and the following up of LREC, as well as the launching of the new ELRA portal.

pdf bib abs

The ELRA License Wizard
Valérie Mapelli | Vladimir Popescu | Lin Liu | Meritxell Fernández Barrera | Khalid Choukri
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

To allow an easy understanding of the various licenses that exist for the use of Language Resources (ELRA’s, META-SHARE’s, Creative Commons’, etc.), ELRA has developed a License Wizardto help the right-holders share/distribute their resources under the appropriate license. It also aims to be exploited by users to better understand the legal obligations that apply in various licensing situations. The present paper elaborates on the License Wizard functionalities of this web configurator, which enables to select a number of legal features and obtain the user license adapted to the users selection, to define which user licenses they would like to select in order to distribute their Language Resources, to integrate the user license terms into a Distribution Agreement that could be proposed to ELRA or META-SHARE for further distribution through the ELRA Catalogue of Language Resources. Thanks to a flexible back office, the structure of the legal feature selection can easily be reviewed to include other features that may be relevant for other licenses. Integrating contributions from other initiatives thus aim to be one of the obvious next steps, with a special focus on CLARIN and Linked Data experiences.

pdf bib abs

New Developments in the LRE Map
Vladimir Popescu | Lin Liu | Riccardo Del Gratta | Khalid Choukri | Nicoletta Calzolari
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper we describe the new developments brought to LRE Map, especially in terms of the user interface of the Web application, of the searching of the information therein, and of the data model updates.

2013

pdf bib

Sentiment Composition Using a Parabolic Model
Baptiste Chardon | Farah Benamara | Yannick Mathieu | Vladimir Popescu | Nicholas Asher
Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013) – Long Papers

2012

pdf bib

Extraction de préférences à partir de dialogues de négociation (Towards Preference Extraction From Negotiation Dialogues) [in French]
Anaïs Cadilhac | Farah Benamara | Vladimir Popescu | Nicholas Asher | Mohamadou Seck
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN

pdf bib

How do Negation and Modality Impact on Opinions?
Farah Benamara | Baptiste Chardon | Yannick Mathieu | Vladimir Popescu | Nicholas Asher
Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics

2011

pdf bib

Towards Context-Based Subjectivity Analysis
Farah Benamara | Baptiste Chardon | Yannick Mathieu | Vladimir Popescu
Proceedings of 5th International Joint Conference on Natural Language Processing

2008

pdf bib

Contrôle rhétorique de l’ellipse sémantique en génération du langage pour le dialogue homme-machine à plusieurs locuteurs [Rhetorical Control of Semantic Ellipsis in Language Generation for Multi-Party Human-Computer Dialogue]
Vladimir Popescu | Jean Caelen | Corneliu Burileanu
Traitement Automatique des Langues, Volume 49, Numéro 1 : Varia [Varia]

pdf bib abs

Contrôle rhétorique de la génération des connecteurs concessifs en dialogue homme-machine
Vladimir Popescu | Jean Caelen
Actes de la 15ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Les connecteurs discursifs ont on rôle important dans l’interprétation des discours (dialogiques ou pas), donc lorsqu’il s’agit de produire des énoncés, le choix des mots qui relient les énoncés (par exemple, en dialogue oral) s’avère essentiel pour assurer la compréhension des visées illocutoires des locuteurs. En linguistique computationnelle, le problème a été abordé surtout au niveau de l’interprétation des discours monologiques, tandis que pour le dialogue, les recherches se sont limitées en général à établir une correspondance quasiment biunivoque entre relations rhétoriques et connecteurs. Dans ce papier nous proposons un mécanisme pour guider la génération des connecteurs concessifs en dialogue, à la fois du point de vue discursif et sémantique ; chaque connecteur considéré sera contraint par un ensemble de conditions qui prennent en compte la cohérence du discours et la pertinence sémantique de chaque mot concerné. Les contraintes discursives, exprimées dans un formalisme dérivé de la SDRT (« Segmented Discourse Representation Theory ») seront plongées dans des contraintes sémantiques sur les connecteurs, proposées par l’école genevoise (Moeschler), pour enfin évaluer la cohérence du discours résultant de l’emploi de ces connecteurs.

2007

pdf bib abs

Architecture modulaire portable pour la génération du langage naturel en dialogue homme-machine
Vladimir Popescu
Actes de la 14ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (Posters)

La génération du langage naturel pour le dialogue oral homme-machine pose des contraintes spécifiques, telles que la spontanéité et le caractère fragmenté des énoncés, les types des locuteurs ou les contraintes de temps de réponse de la part du système. Dans ce contexte, le problème d’une architecture rigoureusement spécifiée se pose, autant au niveau des étapes de traitement et des modules impliqués, qu’au niveau des interfaces entre ces modules. Afin de permettre une liberté quasi-totale à l’égard des démarches théoriques, une telle architecture doit être à la fois modulaire (c’est-à-dire, permettre l’indépendance des niveaux de traitement les uns des autres) et portable (c’est-à-dire, permettre l’interopérabilité avec des modules conçus selon des architectures standard en génération du langage naturel, telles que le modèle RAGS - « Reference Architecture for Generation Systems »). Ainsi, dans cet article on présente de manière concise l’architecture proposée, la comparant ensuite au modèle RAGS, pour argumenter les choix opérés en conception. Dans un second temps, la portabilité de l’architecture sera décrite à travers un exemple étendu, dont la généralité réside dans l’obtention d’un ensemble de règles permettant de plonger automatiquement les représentations des informations de notre architecture vers le format du modèle RAGS et inversement. Finalement, un ensemble de conclusions et perspectives clôturera l’article.

pdf bib

Using Speech Acts in Logic-Based Rhetorical Structuring for Natural Language Generation in Human-Computer Dialogue
Vladimir Popescu | Jean Caelen | Corneliu Burileanu
Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue