David Pérez-Fernández

Also published as: David Perez Fernandez


2020

pdf bib
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)
Doaa Samy | David Pérez-Fernández | Jerónimo Arenas-García
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

pdf bib
Legal-ES: A Set of Large Scale Resources for Spanish Legal Text Processing
Doaa Samy | Jerónimo Arenas-García | David Pérez-Fernández
Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov)

Legal-ES is an open source resource kit for legal Spanish. It consists of a large scale Spanish corpus of open legal texts and different kinds of language models including word embeddings and topic models. The corpus includes over 1000 million words covering a collection of legislative and administrative open access documents in Spanish from different sources representing international, national and regional entities. The corpus is pre-processed and tokenized using Spacy. For the word embeddings, gensim was used on the collection of tokens, producing a representation space that is especially suited to reflect the inherent characteristics of the legal domain. We calculate also topic models to obtain a convenient tool to understand the main topics in the corpus and to navigate through the documents exploiting the semantic similarity among documents. We will analyse the time structure of a dynamic topic model to infer changes in the legal production of Spanish jurisdiction that have occurred over the analysed time framework.

2018

pdf bib
ELRI - European Language Resources Infrastructure
Thierry Etchegoyhen | Borja Anza Porras | Andoni Azpeitia | Eva Martínez Garcia | Paulo Vale | José Luis Fonseca | Teresa Lynn | Jane Dunne | Federico Gaspari | Andy Way | Victoria Arranz | Khalid Choukri | Vladimir Popescu | Pedro Neiva | Rui Neto | Maite Melero | David Perez Fernandez | Antonio Branco | Ruben Branco | Luis Gomes
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

We describe the European Language Resources Infrastructure project, whose main aim is the provision of an infrastructure to help collect, prepare and share language resources that can in turn improve translation services in Europe.