Corina Dima


2024

pdf bib
Leveraging Wikidata for Biomedical Entity Linking in a Low-Resource Setting: A Case Study for German
Faizan E Mustafa | Corina Dima | Juan Ochoa | Steffen Staab
Proceedings of the 6th Clinical Natural Language Processing Workshop

Biomedical Entity Linking (BEL) is a challenging task for low-resource languages, due to the lack of appropriate resources: datasets, knowledge bases (KBs), and pre-trained models. In this paper, we propose an approach to create a biomedical knowledge base for German BEL using UMLS information from Wikidata, that provides good coverage and can be easily extended to further languages. As a further contribution, we adapt several existing approaches for use in the German BEL setup, and report on their results. The chosen methods include a sparse model using character n-grams, a multilingual biomedical entity linker, and two general-purpose text retrieval models. Our results show that a language-specific KB that provides good coverage leads to most improvement in entity linking performance, irrespective of the used model. The finetuned German BEL model, newly created UMLSWikidata KB as well as the code to reproduce our results are publicly available.

2019

pdf bib
No Word is an Island—A Transformation Weighting Model for Semantic Composition
Corina Dima | Daniël de Kok | Neele Witte | Erhard Hinrichs
Transactions of the Association for Computational Linguistics, Volume 7

Composition models of distributional semantics are used to construct phrase representations from the representations of their words. Composition models are typically situated on two ends of a spectrum. They either have a small number of parameters but compose all phrases in the same way, or they perform word-specific compositions at the cost of a far larger number of parameters. In this paper we propose transformation weighting (TransWeight), a composition model that consistently outperforms existing models on nominal compounds, adjective-noun phrases, and adverb-adjective phrases in English, German, and Dutch. TransWeight drastically reduces the number of parameters needed compared with the best model in the literature by composing similar words in the same way.

2017

pdf bib
PP Attachment: Where do We Stand?
Daniël de Kok | Jianqiang Ma | Corina Dima | Erhard Hinrichs
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Prepostitional phrase (PP) attachment is a well known challenge to parsing. In this paper, we combine the insights of different works, namely: (1) treating PP attachment as a classification task with an arbitrary number of attachment candidates; (2) using auxiliary distributions to augment the data beyond the hand-annotated training set; (3) using topological fields to get information about the distribution of PP attachment throughout clauses and (4) using state-of-the-art techniques such as word embeddings and neural networks. We show that jointly using these techniques leads to substantial improvements. We also conduct a qualitative analysis to gauge where the ceiling of the task is in a realistic setup.

pdf bib
Distributional regularities of verbs and verbal adjectives: Treebank evidence and broader implications
Daniël de Kok | Patricia Fischer | Corina Dima | Erhard Hinrichs
Proceedings of the 16th International Workshop on Treebanks and Linguistic Theories

2016

pdf bib
On the Compositionality and Semantic Interpretation of English Noun Compounds
Corina Dima
Proceedings of the 1st Workshop on Representation Learning for NLP

2015

pdf bib
Reverse-engineering Language: A Study on the Semantic Compositionality of German Compounds
Corina Dima
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Automatic Noun Compound Interpretation using Deep Neural Networks and Word Embeddings
Corina Dima | Erhard Hinrichs
Proceedings of the 11th International Conference on Computational Semantics

2014

pdf bib
How to Tell a Schneemann from a Milchmann: An Annotation Scheme for Compound-Internal Relations
Corina Dima | Verena Henrich | Erhard Hinrichs | Christina Hoppermann
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a language-independent annotation scheme for the semantic relations that link the constituents of noun-noun compounds, such as Schneemann ‘snow man’ or Milchmann ‘milk man’. The annotation scheme is hybrid in the sense that it assigns each compound a two-place label consisting of a semantic property and a prepositional paraphrase. The resulting inventory combines the insights of previous annotation schemes that rely exclusively on either semantic properties or prepositions, thus avoiding the known weaknesses that result from using only one of the two label types. The proposed annotation scheme has been used to annotate a set of 5112 German noun-noun compounds. A release of the dataset is currently being prepared and will be made available via the CLARIN Center Tübingen. In addition to the presentation of the hybrid annotation scheme, the paper also reports on an inter-annotator agreement study that has resulted in a substantial agreement among annotators.

2011

pdf bib
A Semi-Automatic, Iterative Method for Creating a Domain-Specific Treebank
Corina Dima | Erhard Hinrichs
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011