2010
pdf
bib
A Linguistically Grounded Graph Model for Bilingual Lexicon Extraction
Florian Laws
|
Lukas Michelbacher
|
Beate Dorow
|
Christian Scheible
|
Ulrich Heid
|
Hinrich Schütze
Coling 2010: Posters
pdf
bib
abs
Building a Cross-lingual Relatedness Thesaurus using a Graph Similarity Measure
Lukas Michelbacher
|
Florian Laws
|
Beate Dorow
|
Ulrich Heid
|
Hinrich Schütze
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
The Internet is an ever growing source of information stored in documents of different languages. Hence, cross-lingual resources are needed for more and more NLP applications. This paper presents (i) a graph-based method for creating one such resource and (ii) a resource created using the method, a cross-lingual relatedness thesaurus. Given a word in one language, the thesaurus suggests words in a second language that are semantically related. The method requires two monolingual corpora and a basic dictionary. Our general approach is to build two monolingual word graphs, with nodes representing words and edges representing linguistic relations between words. A bilingual dictionary containing basic vocabulary provides seed translations relating nodes from both graphs. We then use an inter-graph node-similarity algorithm to discover related words. Evaluation with three human judges revealed that 49% of the English and 57% of the German words discovered by our method are semantically related to the target words. We publish two resources in conjunction with this paper. First, noun coordinations extracted from the German and English Wikipedias. Second, the cross-lingual relatedness thesaurus which can be used in experiments involving interactive cross-lingual query expansion.
2009
pdf
bib
A Graph-Theoretic Algorithm for Automatic Extension of Translation Lexicons
Beate Dorow
|
Florian Laws
|
Lukas Michelbacher
|
Christian Scheible
|
Jason Utt
Proceedings of the Workshop on Geometrical Models of Natural Language Semantics
2006
pdf
bib
abs
Ongoing Developments in Automatically Adapting Lexical Resources to the Biomedical Domain
Dominic Widdows
|
Adil Toumouh
|
Beate Dorow
|
Ahmed Lehireche
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper describes a range of experiments using empirical methods to adapt theWordNet noun ontology for specific use in the biomedical domain. Our basic technique is to extract relationships between terms using the Ohsumed corpus, a large collection of abstracts from PubMed, and to compare the relationships extracted with those that would be expected for medical terms, given the structure of the WordNet ontology. The linguistic methods involve the use of a variety of lexicosyntactic patterns that enable us to extract pairs of coordinate noun terms, and also related groups of adjectives and nouns, using Markov clustering. This enables us in many cases to analyse ambiguous words and select the correct meaning for the biomedical domain. While results are often encouraging, the paper also highlights evident problems and drawbacks with the method, and outlines suggestions for future work.
2005
pdf
bib
Automatic Extraction of Idioms using Graph Analysis and Asymmetric Lexicosyntactic Patterns
Dominic Widdows
|
Beate Dorow
Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
2003
pdf
bib
Discovering Corpus-Specific Word Senses
Beate Dorow
|
Dominic Widdows
10th Conference of the European Chapter of the Association for Computational Linguistics
2002
pdf
bib
A Graph Model for Unsupervised Lexical Acquisition
Dominic Widdows
|
Beate Dorow
COLING 2002: The 19th International Conference on Computational Linguistics
pdf
bib
Using Parallel Corpora to enrich Multilingual Lexical Resources
Dominic Widdows
|
Beate Dorow
|
Chiu-Ki Chan
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)