Christian Fäth


2020

pdf bib
Translation Inference by Concept Propagation
Christian Chiarcos | Niko Schenk | Christian Fäth
Proceedings of the 2020 Globalex Workshop on Linked Lexicography

This paper describes our contribution to the Third Shared Task on Translation Inference across Dictionaries (TIAD-2020). We describe an approach on translation inference based on symbolic methods, the propagation of concepts over a graph of interconnected dictionaries: Given a mapping from source language words to lexical concepts (e.g., synsets) as a seed, we use bilingual dictionaries to extrapolate a mapping of pivot and target language words to these lexical concepts. Translation inference is then performed by looking up the lexical concept(s) of a source language word and returning the target language word(s) for which these lexical concepts have the respective highest score. We present two instantiations of this system: One using WordNet synsets as concepts, and one using lexical entries (translations) as concepts. With a threshold of 0, the latter configuration is the second among participant systems in terms of F1 score. We also describe additional evaluation experiments on Apertium data, a comparison with an earlier approach based on embedding projection, and an approach for constrained projection that outperforms the TIAD-2020 vanilla system by a large margin.

pdf bib
The ACoLi Dictionary Graph
Christian Chiarcos | Christian Fäth | Maxim Ionov
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we report the release of the ACoLi Dictionary Graph, a large-scale collection of multilingual open source dictionaries available in two machine-readable formats, a graph representation in RDF, using the OntoLex-Lemon vocabulary, and a simple tabular data format to facilitate their use in NLP tasks, such as translation inference across dictionaries. We describe the mapping and harmonization of the underlying data structures into a unified representation, its serialization in RDF and TSV, and the release of a massive and coherent amount of lexical data under open licenses.

pdf bib
Recent Developments for the Linguistic Linked Open Data Infrastructure
Thierry Declerck | John Philip McCrae | Matthias Hartung | Jorge Gracia | Christian Chiarcos | Elena Montiel-Ponsoda | Philipp Cimiano | Artem Revenko | Roser Saurí | Deirdre Lee | Stefania Racioppa | Jamal Abdul Nasir | Matthias Orlikowsk | Marta Lanau-Coronas | Christian Fäth | Mariano Rico | Mohammad Fazleh Elahi | Maria Khvalchik | Meritxell Gonzalez | Katharine Cooney
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper we describe the contributions made by the European H2020 project “Prêt-à-LLOD” (‘Ready-to-use Multilingual Linked Language Data for Knowledge Services across Sectors’) to the further development of the Linguistic Linked Open Data (LLOD) infrastructure. Prêt-à-LLOD aims to develop a new methodology for building data value chains applicable to a wide range of sectors and applications and based around language resources and language technologies that can be integrated by means of semantic technologies. We describe the methods implemented for increasing the number of language data sets in the LLOD. We also present the approach for ensuring interoperability and for porting LLOD data sets and services to other infrastructures, as well as the contribution of the projects to existing standards.

pdf bib
Annotation Interoperability for the Post-ISOCat Era
Christian Chiarcos | Christian Fäth | Frank Abromeit
Proceedings of the 12th Language Resources and Evaluation Conference

With this paper, we provide an overview over ISOCat successor solutions and annotation standardization efforts since 2010, and we describe the low-cost harmonization of post-ISOCat vocabularies by means of modular, linked ontologies: The CLARIN Concept Registry, LexInfo, Universal Parts of Speech, Universal Dependencies and UniMorph are linked with the Ontologies of Linguistic Annotation and through it with ISOCat, the GOLD ontology, the Typological Database Systems ontology and a large number of annotation schemes.

pdf bib
Fintan - Flexible, Integrated Transformation and Annotation eNgineering
Christian Fäth | Christian Chiarcos | Björn Ebbrecht | Maxim Ionov
Proceedings of the 12th Language Resources and Evaluation Conference

We introduce the Flexible and Integrated Transformation and Annotation eNgeneering (Fintan) platform for converting heterogeneous linguistic resources to RDF. With its modular architecture, workflow management and visualization features, Fintan facilitates the development of complex transformation pipelines by integrating generic RDF converters and augmenting them with extended graph processing capabilities: Existing converters can be easily deployed to the system by means of an ontological data structure which renders their properties and the dependencies between transformation steps. Development of subsequent graph transformation steps for resource transformation, annotation engineering or entity linking is further facilitated by a novel visual rendering of SPARQL queries. A graphical workflow manager allows to easily manage the converter modules and combine them to new transformation pipelines. Employing the stream-based graph processing approach first implemented with CoNLL-RDF, we address common challenges and scalability issues when transforming resources and showcase the performance of Fintan by means of a purely graph-based transformation of the Universal Morphology data to RDF.

pdf bib
Annohub – Annotation Metadata for Linked Data Applications
Frank Abromeit | Christian Fäth | Luis Glaser
Proceedings of the 7th Workshop on Linked Data in Linguistics (LDL-2020)

We introduce a new dataset for the Linguistic Linked Open Data (LLOD) cloud that will provide metadata about annotation and language information harvested from annotated language resources like corpora freely available on the internet. To our knowledge annotation metadata is not provided by any metadata provider, e.g. linghub, datahub or CLARIN so far. On the other hand, language metadata that is found on such portals is rarely provided in machine-readable form, especially as Linked Data. In this paper, we describe the harvesting process, content and structure of the new dataset and its application in the Lin|gu|is|tik portal, a research platform for linguists. Aside from that, we introduce tools for the conversion of XML encoded language resources to the CoNLL format. The generated RDF data as well as the XML-converter application are made public under an open license.

pdf bib
On the Linguistic Linked Open Data Infrastructure
Christian Chiarcos | Bettina Klimek | Christian Fäth | Thierry Declerck | John Philip McCrae
Proceedings of the 1st International Workshop on Language Technology Platforms

In this paper we describe the current state of development of the Linguistic Linked Open Data (LLOD) infrastructure, an LOD(sub-)cloud of linguistic resources, which covers various linguistic data bases, lexicons, corpora, terminology and metadata repositories.We give in some details an overview of the contributions made by the European H2020 projects “Prêt-à-LLOD” (‘Ready-to-useMultilingual Linked Language Data for Knowledge Services across Sectors’) and “ELEXIS” (‘European Lexicographic Infrastructure’) to the further development of the LLOD.

2018

pdf bib
Universal Morphologies for the Caucasus region
Christian Chiarcos | Kathrin Donandt | Maxim Ionov | Monika Rind-Pawlowski | Hasmik Sargsian | Jesse Wichers Schreur | Frank Abromeit | Christian Fäth
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Analyzing Middle High German Syntax with RDF and SPARQL
Christian Chiarcos | Benjamin Kosmehl | Christian Fäth | Maria Sukhareva
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
Interoperability of Language-related Information: Mapping the BLL Thesaurus to Lexvo and Glottolog
Vanya Dimitrova | Christian Fäth | Christian Chiarcos | Heike Renner-Westermann | Frank Abromeit
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Lin|gu|is|tik: Building the Linguist’s Pathway to Bibliographies, Libraries, Language Resources and Linked Open Data
Christian Chiarcos | Christian Fäth | Heike Renner-Westermann | Frank Abromeit | Vanya Dimitrova
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces a novel research tool for the field of linguistics: The Lin|gu|is|tik web portal provides a virtual library which offers scientific information on every linguistic subject. It comprises selected internet sources and databases as well as catalogues for linguistic literature, and addresses an interdisciplinary audience. The virtual library is the most recent outcome of the Special Subject Collection Linguistics of the German Research Foundation (DFG), and also integrates the knowledge accumulated in the Bibliography of Linguistic Literature. In addition to the portal, we describe long-term goals and prospects with a special focus on ongoing efforts regarding an extension towards integrating language resources and Linguistic Linked Open Data.