Melanie Siegel


2023

pdf bib
Connecting Multilingual Wordnets: Strategies for Improving ILI Classification in OdeNet
Melanie Siegel | Johann Bergh
Proceedings of the 12th Global Wordnet Conference

The Open Multilingual Wordnet (OMW) is an open source project that was launched with the goal to make it easy to use wordnets in multiple languages without having to pay expensive proprietary licensing costs. As OMW evolved, the interlingual indicator (ILI)1 was used to allow semantically equivalent synsets in different languages to be linked to each other. OdeNet2 is the German language wordnet which forms part of the OMW project. This paper analyses the shortcomings of the initial ILI classification in OdeNet and the consequent methods used to improve this classification.

pdf bib
Towards UkrainianWordNet: Incorporation of an Existing Thesaurus in the Domain of Physics
Melanie Siegel | Maksym Vakulenko | Jonathan Baum
Proceedings of the 19th Conference on Natural Language Processing (KONVENS 2023)

2022

pdf bib
DeTox: A Comprehensive Dataset for German Offensive Language and Conversation Analysis
Christoph Demus | Jonas Pitz | Mina Schütz | Nadine Probol | Melanie Siegel | Dirk Labudde
Proceedings of the Sixth Workshop on Online Abuse and Harms (WOAH)

In this work, we present a new publicly available offensive language dataset of 10.278 German social media comments collected in the first half of 2021 that were annotated by in total six annotators. With twelve different annotation categories, it is far more comprehensive than other datasets, and goes beyond just hate speech detection. The labels aim in particular also at toxicity, criminal relevance and discrimination types of comments. Furthermore, about half of the comments are from coherent parts of conversations, which opens the possibility to consider the comments’ contexts and do conversation analyses in order to research the contagion of offensive language in conversations.

2021

pdf bib
OdeNet: Compiling a GermanWordNet from other Resources
Melanie Siegel | Francis Bond
Proceedings of the 11th Global Wordnet Conference

The Princeton WordNet for the English language has been used worldwide in NLP projects for many years. With the OMW initiative, wordnets for different languages of the world are being linked via identifiers. The parallel development and linking allows new multilingual application perspectives. The development of a wordnet for the German language is also in this context. To save development time, existing resources were combined and recompiled. The result was then evaluated and improved. In a relatively short time a resource was created that can be used in projects and continuously improved and extended.

pdf bib
DeTox at GermEval 2021: Toxic Comment Classification
Mina Schütz | Christoph Demus | Jonas Pitz | Nadine Probol | Melanie Siegel | Dirk Labudde
Proceedings of the GermEval 2021 Shared Task on the Identification of Toxic, Engaging, and Fact-Claiming Comments

In this work, we present our approaches on the toxic comment classification task (subtask 1) of the GermEval 2021 Shared Task. For this binary task, we propose three models: a German BERT transformer model; a multilayer perceptron, which was first trained in parallel on textual input and 14 additional linguistic features and then concatenated in an additional layer; and a multilayer perceptron with both feature types as input. We enhanced our pre-trained transformer model by re-training it with over 1 million tweets and fine-tuned it on two additional German datasets of similar tasks. The embeddings of the final fine-tuned German BERT were taken as the textual input features for our neural networks. Our best models on the validation data were both neural networks, however our enhanced German BERT gained with a F1-score = 0.5895 a higher prediction on the test data.

2020

pdf bib
Adding Pronunciation Information to Wordnets
Thierry Declerck | Lenka Bajcetic | Melanie Siegel
Proceedings of the LREC 2020 Workshop on Multimodal Wordnets (MMW2020)

We describe on-going work consisting in adding pronunciation information to wordnets, as such information can indicate specific senses of a word. Many wordnets associate with their senses only a lemma form and a part-of-speech tag. At the same time, we are aware that additional linguistic information can be useful for identifying a specific sense of a wordnet lemma when encountered in a corpus. While work already deals with the addition of grammatical number or grammatical gender information to wordnet lemmas,we are investigating the linking of wordnet lemmas to pronunciation information, adding thus a speech-related modality to wordnets

2019

pdf bib
Using OntoLex-Lemon for Representing and Interlinking German Multiword Expressions in OdeNet and MMORPH
Thierry Declerck | Melanie Siegel | Stefania Racioppa
Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019)

We describe work consisting in porting two large German lexical resources into the OntoLex-Lemon model in order to establish complementary interlinkings between them. One resource is OdeNet (Open German WordNet) and the other is a further development of the German version of the MMORPH morphological analyzer. We show how the Multiword Expressions (MWEs) contained in OdeNet can be morphologically specified by the use of the lexical representation and linking features of OntoLex-Lemon, which also support the formulation of restrictions in the usage of such expressions.

pdf bib
OntoLex as a possible Bridge between WordNets and full lexical Descriptions
Thierry Declerck | Melanie Siegel
Proceedings of the 10th Global Wordnet Conference

In this paper we describe our current work on representing a recently created German lexical semantics resource in OntoLex-Lemon and in conformance with WordNet specifications. Besides presenting the representation effort, we show the utilization of OntoLex-Lemon to bridge from WordNet-like resources to full lexical descriptions and extend the coverage of WordNets to other types of lexical data, such as decomposition results, exemplified for German data, and inflectional phenomena, here outlined for English data.

2012

pdf bib
Using Automatic Machine Translation Metrics to Analyze the Impact of Source Reformulations
Johann Roturier | Linda Mitchell | Robert Grabowski | Melanie Siegel
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper investigates the usefulness of automatic machine translation metrics when analyzing the impact of source reformulations on the quality of machine-translated user generated content. We propose a novel framework to quickly identify rewriting rules which improve or degrade the quality of MT output, by trying to rely on automatic metrics rather than human judgments. We find that this approach allows us to quickly identify overlapping rules between two language pairs (English- French and English-German) and specific cases where the rules’ precision could be improved.

2006

pdf bib
Generating and Visualizing a Soccer Knowledge Base
Paul Buitelaar | Thomas Eigner | Greg Gul-rajani | Alexander Schutz | Melanie Siegel | Nicolas Weber | Philipp Cimiano | Günter Ladwig | Matthias Mantel | Honggang Zhu
Demonstrations

pdf bib
Ontology-based Information Extraction with SOBA
Paul Buitelaar | Philipp Cimiano | Stefania Racioppa | Melanie Siegel
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we describe SOBA, a sub-component of the SmartWeb multi-modal dialog system. SOBA is a component for ontologybased information extraction from soccer web pages for automatic population of a knowledge base that can be used for domainspecific question answering. SOBA realizes a tight connection between the ontology, knowledge base and the information extraction component. The originality of SOBA is in the fact that it extracts information from heterogeneous sources such as tabular structures, text and image captions in a semantically integrated way. In particular, it stores extracted information in a knowledge base, and in turn uses the knowledge base to interpret and link newly extracted information with respect to already existing entities.

2005

pdf bib
Integration of a Lexical Type Database with a Linguistically Interpreted Corpus
Chikara Hashimoto | Francis Bond | Takaaki Tanaka | Melanie Siegel
Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora (LINC-2005)

pdf bib
Annotating Honorifics Denoting Social Ranking of Referents
Shigeko Nariyama | Hiromi Nakaiwa | Melanie Siegel
Proceedings of the Sixth International Workshop on Linguistically Interpreted Corpora (LINC-2005)

pdf bib
Open Source Machine Translation with DELPH-IN
Francis Bond | Stephan Oepen | Melanie Siegel | Ann Copestake | Dan Flickinger
Workshop on open-source machine translation

2004

pdf bib
The DeepThought Core Architecture Framework
Ulrich Callmeier | Andreas Eisele | Ulrich Schäfer | Melanie Siegel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
An Integrated Archictecture for Shallow and Deep Processing
Berthold Crysmann | Anette Frank | Bernd Kiefer | Stefan Mueller | Guenter Neumann | Jakub Piskorski | Ulrich Schaefer | Melanie Siegel | Hans Uszkoreit | Feiyu Xu | Markus Becker | Hans-Ulrich Krieger
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics

pdf bib
Efficient Deep Processing of Japanese
Melanie Siegel | Emily M. Bender
COLING-02: The 3rd Workshop on Asian Language Resources and International Standardization

pdf bib
Parallel Distributed Grammar Engineering for Practical Applications
Stephan Oepen | Emily M. Bender | Uli Callmeier | Dan Flickinger | Melanie Siegel
COLING-02: Grammar Engineering and Evaluation

2000

pdf bib
An HPSG-to-CFG Approximation of Japanese
Bernd Kiefer | Hans-Ulrich Krieger | Melanie Siegel
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

pdf bib
Japanese Honorification in an HPSG Framework
Melanie Siegel
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation

1999

pdf bib
The Syntactic Processing of Particles in Japanese Spoken Language
Melanie Siegel
Proceedings of the 13th Pacific Asia Conference on Language, Information and Computation

1996

pdf bib
Preferences and Defaults for Definiteness and Number in Japanese to German Machine Translation
Melanie Siegel
Proceedings of the 11th Pacific Asia Conference on Language, Information and Computation