José M. García Miguel

Also published as: José M. García-Miguel, José M. Garcia-Miguel


2026

The development of accurate syntactic parsers remains a challenge for low-resource languages. To overcome it, the literature has proposed leveraging syntactic annotations from typologically related languages. This work investigates the viability and adequacy of this approach for Galician, evaluating the use of annotations from major Romance languages as source data. Our methodology extends beyond standard automatic evaluation to incorporate a detailed error analysis, which precisely quantifies the effects of multilingual training and assesses the practical scalability of the method. The results establish the necessity of embedding models for effective cross-lingual transfer and demonstrate that even languages not particularly close can yield adequate parsers. This work confirms the benefits of cross-lingual data augmentation while delineating its scalability limits. Furthermore, the error analysis identifies specific, typologically conditioned grammatical dependencies that remain persistent challenges for accurate dependency parsing.

2016

CORILSE is a computerized corpus of Spanish Sign Language (Lengua de Signos Española, LSE). It consists of a set of recordings from different discourse genres by Galician signers living in the city of Vigo. In this paper we describe its annotation system, developed on the basis of pre-existing ones (mostly the model of Auslan corpus). This includes primary annotation of id-glosses for manual signs, annotation of non-manual component, and secondary annotation of grammatical categories and relations, because this corpus is been built for grammatical analysis, in particular argument structures in LSE. Up until this moment the annotation has been basically made by hand, which is a slow and time-consuming task. The need to facilitate this process leads us to engage in the development of automatic or semi-automatic tools for manual and facial recognition. Finally, we also present the web repository that will make the corpus available to different types of users, and will allow its exploitation for research purposes and other applications (e.g. teaching of LSE or design of tasks for signed language assessment).

2012

This paper will present the design of a Galician syntactic corpus with application to intonation modeling. A corpus of around $3000$ sentences was designed with variation in the syntactic structure and the number of accent groups, and recorded by a professional speaker to study the influence on the prosodic structure.

2010

This is an overall description of ADESSE (""Base de datos de verbos, Alternancias de Diátesis y Esquemas Sintactico-Semánticos del Español""), an online database (http://adesse.uvigo.es/) with syntactic and semantic information for all clauses in a corpus of Spanish. The manually annotated corpus has 1.5 million words, 159,000 clauses and 3,450 different verb lemmas. ADESSE is an expanded version of BDS (""Base de datos sintácticos del español actual""), which contains the grammatical features of verbs and verb-arguments in the corpus. ADESSE has added semantic features such as verb sense, verb class and semantic role of arguments to make possible a detailed syntactic and semantic corpus-based characterization of verb valency. Each verb entry in the database is described in terms of valency potential and valency realizations (diatheses). The former includes a set of semantic roles of participants in a particular event type and a classification into a conceptual hierarchy of process types. Valency realizations are described in terms of correspondences of voice, syntactic functions and categories, and semantic roles. Verbs senses are discriminated at two levels: a more abstract level linked to a valency potential, and more specific verb senses taking into account particular lexical instantiations of arguments.