Elena Manishina


2016

Dans cette démonstration, nous proposons un système qui permettrait aux utilisateurs non-voyants d’obtenir le first glance d’une page web. L’objectif est de réduire le temps d’accès à la structure logico-thématique de la page et de favoriser le développement de stratégies de lecture de haut niveau. Notre concept, appelé Tag Thunder, s’appuie sur une phase de segmentation de la page en zones, suivie d’une étape de représentation des zones par un mot ou groupe de mots, puis une vocalisation simultanée de ces représentants.
As data-driven approaches started to make their way into the Natural Language Generation (NLG) domain, the need for automation of corpus building and extension became apparent. Corpus creation and extension in data-driven NLG domain traditionally involved manual paraphrasing performed by either a group of experts or with resort to crowd-sourcing. Building the training corpora manually is a costly enterprise which requires a lot of time and human resources. We propose to automate the process of corpus extension by integrating automatically obtained synonyms and paraphrases. Our methodology allowed us to significantly increase the size of the training corpus and its level of variability (the number of distinct tokens and specific syntactic structures). Our extension solutions are fully automatic and require only some initial validation. The human evaluation results confirm that in many cases native speakers favor the outputs of the model built on the extended corpus.

2013