Antonio Pareja Lora

Also published as: Antonio Pareja-Lora

This research has focused on evaluating the existing open-source morphological analyzers for two of the most widely spoken indigenous macrolanguages in South America, namely Quechua and Aymara. Firstly, we have evaluated their performance (precision, recall and F1 score) for the individual languages for which they were developed (Cuzco Quechua and Aymara). Secondly, in order to assess how these tools handle other individual languages of the macrolanguage, we have extracted some sample text from school textbooks and educational resources. This sample text was edited in the different countries where these macrolanguages are spoken (Colombia, Ecuador, Peru, Bolivia, Chile and Argentina for Quechua; and Bolivia, Peru and Chile for Aymara), and it includes their different standardized forms (10 individual languages of Quechua and 3 of Aymara). Processing this text by means of the tools, we have (i) calculated their coverage (number of words recognized and analyzed) and (ii) studied in detail the cases for which each tool was unable to generate any output. Finally, we discuss different ways in which these tools could be optimized, either to improve their performances or, in the specific case of Quechua, to cover more individual languages of this macrolanguage in future works as well.

pdf bib abs

LITHME: Language in the Human-Machine Era
Maarit Koponen | Kais Allkivi-Metsoja | Antonio Pareja-Lora | Dave Sayers | Márta Seresi
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

The LITHME COST Action brings together researchers from various fields of study focusing on language and technology. We present the overall goals of LITHME and the network’s working groups focusing on diverse questions related to language and technology. As an example of the work of the LITHME network, we discuss the working group on language work and language professionals.

2020

pdf bib abs

Towards a Spell Checker for Zamboanga Chavacano Orthography
Marcelo Yuji Himoro | Antonio Pareja-Lora
Proceedings of the Twelfth Language Resources and Evaluation Conference

Zamboanga Chabacano (ZC) is the most vibrant variety of Philippine Creole Spanish, with over 400,000 native speakers in the Philippines (as of 2010). Following its introduction as a subject and a medium of instruction in the public schools of Zamboanga City from Grade 1 to 3 in 2012, an official orthography for this variety - the so-called “Zamboanga Chavacano Orthography” - has been approved in 2014. Its complexity, however, is a barrier to most speakers, since it does not necessarily reflect the particular phonetic evolution in ZC, but favours etymology instead. The distance between the correct spelling and the different spelling variations is often so great that delivering acceptable performance with the current de facto spell checking technologies may be challenging. The goals of this research have been to propose i) a spelling error taxonomy for ZC, formalised as an ontology and ii) an adaptive spell checking approach using Character-Based Statistical Machine Translation to correct spelling errors in ZC. Our results show that this approach is suitable for the goals mentioned and that it could be combined with other current spell checking technologies to achieve even higher performance.

2016

pdf bib abs

The Open Linguistics Working Group (OWLG) brings together researchers from various fields of linguistics, natural language processing, and information technology to present and discuss principles, case studies, and best practices for representing, publishing and linking linguistic data collections. A major outcome of our work is the Linguistic Linked Open Data (LLOD) cloud, an LOD (sub-)cloud of linguistic resources, which covers various linguistic databases, lexicons, corpora, terminologies, and metadata repositories. We present and summarize five years of progress on the development of the cloud and of advancements in open data in linguistics, and we describe recent community activities. The paper aims to serve as a guideline to orient and involve researchers with the community and/or Linguistic Linked Open Data.

2014

pdf bib abs

Standardisation and Interoperation of Morphosyntactic and Syntactic Annotation Tools for Spanish and their Annotations
Antonio Pareja-Lora | Guillermo Cárcamo-Escorza | Alicia Ballesteros-Calvo
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Linguistic annotation tools and linguistic annotations are scarcely syntactically and/or semantically interoperable. Their low interoperability usually results from the number of factors taken into account in their development and design. These include (i) the type of phenomena annotated (either morphosyntactic, syntactic, semantic, etc.); (ii) how these phenomena are annotated (e.g., the particular guidelines and/or schema used to encode the annotations); and (iii) the languages (Java, C++, etc.) and technologies (as standalone programs, as APIs, as web services, etc.) used to develop them. This low level of interoperability makes it difficult to reuse both the linguistic annotation tools and their annotations in new scenarios, e.g., in natural language processing (NLP) pipelines. In spite of this, developing new linguistic tools from scratch is quite a high time-consuming task that also entails a very high cost. Therefore, cost-effective ways to systematically reuse linguistic tools and annotations must be found urgently. A traditional way to overcome reuse and/or interoperability problems is standardisation. In this paper, we present a web service version of FreeLing that provides standard-compliant morpho-syntactic and syntactic annotations for Spanish, according to several ISO linguistic annotation standards and standard drafts.

2013

pdf bib

Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse
Antonio Pareja-Lora | Maria Liakata | Stefanie Dipper
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib

Transforming the Data Transcription and Analysis Tool Metadata and Labels into a Linguistic Linked Open Data Cloud Resource
Antonio Pareja-Lora | María Blume | Barbara Lust
Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data

2010

pdf bib abs

Ontology-based Interoperation of Linguistic Tools for an Improved Lemma Annotation in Spanish
Antonio Pareja-Lora | Guadalupe Aguado de Cea
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we present an ontology-based methodology and architecture for the comparison, assessment, combination (and, to some extent, also contrastive evaluation) of the results of different linguistic tools. More specifically, we describe an experiment aiming at the improvement of the correctness of lemma tagging for Spanish. This improvement was achieved by means of the standardisation and combination of the results of three different linguistic annotation tools (Bitexts DataLexica, Connexors FDG Parser and LACELLs POS tagger), using (1) ontologies, (2) a set of lemma tagging correction rules, determined empirically during the experiment, and (3) W3C standard languages, such as XML, RDF(S) and OWL. As we show in the results of the experiment, the interoperation of these tools by means of ontologies and the correction rules applied in the experiment improved significantly the quality of the resulting lemma tagging (when compared to the separate lemma tagging performed by each of the tools that we made interoperate).

2008

pdf bib abs

Ontology-Based Interface Specifications for a NLP Pipeline Architecture
Ekaterina Buyko | Christian Chiarcos | Antonio Pareja Lora
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The high level of heterogeneity between linguistic annotations usually complicates the interoperability of processing modules within an NLP pipeline. In this paper, a framework for the interoperation of NLP components, based on a data-driven architecture, is presented. Here, ontologies of linguistic annotation are employed to provide a conceptual basis for the tagset-neutral processing of linguistic annotations. The framework proposed here is based on a set of structured OWL ontologies: a reference ontology, a set of annotation models which formalize different annotation schemes, and a declarative linking between these, specified separately. This modular architecture is particularly scalable and flexible as it allows for the integration of different reference ontologies of linguistic annotations in order to overcome the absence of a consensus for an ontology of linguistic terminology. Our proposal originates from three lines of research from different fields: research on annotation type systems in UIMA; the ontological architecture OLiA, originally developed for sustainable documentation and annotation-independent corpus browsing, and the ontologies of the OntoTag model, targeted towards the processing of linguistic annotations in Semantic Web applications. We describe how UIMA annotations can be backed up by ontological specifications of annotation schemes as in the OLiA model, and how these are linked to the OntoTag ontologies, which allow for further ontological processing.

2004

pdf bib

OntoTag’s Linguistic Ontologies: Enhancing Higher Level and Semantic Web Annotations
Guadalupe Aguado de Cea | Inmaculada Álvarez-de-Mon | Antonio Pareja-Lora
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib

RDF(S)/XML Linguistic Annotation of Semantic Web Pages
Guadalupe Aguado de Cea | Inmaculada Álvarez-de-Mon | Antonio Pareja-Lora | Rosario Plaza-Arteche
COLING-02: The 2nd Workshop on NLP and XML (NLPXML-2002)