Johannes Dellert


2018

pdf bib
Combining Information-Weighted Sequence Alignment and Sound Correspondence Models for Improved Cognate Detection
Johannes Dellert
Proceedings of the 27th International Conference on Computational Linguistics

Methods for automated cognate detection in historical linguistics invariably build on some measure of form similarity which is designed to capture the remaining systematic similarities between cognate word forms after thousands of years of divergence. A wide range of clustering and classification algorithms has been explored for the purpose, whereas possible improvements on the level of pairwise form similarity measures have not been the main focus of research. The approach presented in this paper improves on this core component of cognate detection systems by a novel combination of information weighting, a technique for putting less weight on reoccurring morphological material, with sound correspondence modeling by means of pointwise mutual information. In evaluations on expert cognacy judgments over a subset of the IPA-encoded NorthEuraLex database, the combination of both techniques is shown to lead to considerable improvements in average precision for binary cognate detection, and modest improvements for distance-based cognate clustering.

2008

pdf bib
TuLiPA: Towards a Multi-Formalism Parsing Environment for Grammar Engineering
Laura Kallmeyer | Timm Lichte | Wolfgang Maier | Yannick Parmentier | Johannes Dellert | Kilian Evang
Coling 2008: Proceedings of the workshop on Grammar Engineering Across Frameworks

pdf bib
TuLiPA: A syntax-semantics parsing environment for mildly context-sensitive formalisms
Yannick Parmentier | Laura Kallmeyer | Wolfgang Maier | Timm Lichte | Johannes Dellert
Proceedings of the Ninth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+9)

pdf bib
Developing a TT-MCTAG for German with an RCG-based Parser
Laura Kallmeyer | Timm Lichte | Wolfgang Maier | Yannick Parmentier | Johannes Dellert
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Developing linguistic resources, in particular grammars, is known to be a complex task in itself, because of (amongst others) redundancy and consistency issues. Furthermore some languages can reveal themselves hard to describe because of specific characteristics, e.g. the free word order in German. In this context, we present (i) a framework allowing to describe tree-based grammars, and (ii) an actual fragment of a core multicomponent tree-adjoining grammar with tree tuples (TT-MCTAG) for German developed using this framework. This framework combines a metagrammar compiler and a parser based on range concatenation grammar (RCG) to respectively check the consistency and the correction of the grammar. The German grammar being developed within this framework already deals with a wide range of scrambling and extraction phenomena.

pdf bib
Ontology-Based XQuery’ing of XML-Encoded Language Resources on Multiple Annotation Layers
Georg Rehm | Richard Eckart | Christian Chiarcos | Johannes Dellert
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly flexible web-based graphical interface that can be used to query corpora with regard to several different linguistic properties such as, for example, syntactic tree fragments. This interface can also be used for ontology-based querying of multiple corpora simultaneously.