2022
pdf
bib
abs
Data Sets of Eating Disorders by Categorizing Reddit and Tumblr Posts: A Multilingual Comparative Study Based on Empirical Findings of Texts and Images
Christina Baskal
|
Amelie Elisabeth Beutel
|
Jessika Keberlein
|
Malte Ollmann
|
Esra Üresin
|
Jana Vischinski
|
Janina Weihe
|
Linda Achilles
|
Christa Womser-Hacker
Proceedings of the Workshop on Dataset Creation for Lower-Resourced Languages within the 13th Language Resources and Evaluation Conference
Research has shown the potential negative impact of social media usage on body image. Various platforms present numerous medial formats of possibly harmful content related to eating disorders. Different cultural backgrounds, represented, for example, by different languages, are participating in the discussion online. Therefore, this research aims to investigate eating disorder specific content in a multilingual and multimedia environment. We want to contribute to establishing a common ground for further automated approaches. Our first objective is to combine the two media formats, text and image, by classifying the posts from one social media platform (Reddit) and continuing the categorization in the second (Tumblr). Our second objective is the analysis of multilingualism. We worked qualitatively in an iterative valid categorization process, followed by a comparison of the portrayal of eating disorders on both platforms. Our final data sets contained 960 Reddit and 2 081 Tumblr posts. Our analysis revealed that Reddit users predominantly exchange content regarding disease and eating behaviour, while on Tumblr, the focus is on the portrayal of oneself and one’s body.
2012
pdf
bib
abs
A Resource-light Approach to Phrase Extraction for English and German Documents from the Patent Domain and User Generated Content
Julia Maria Schulz
|
Daniela Becks
|
Christa Womser-Hacker
|
Thomas Mandl
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
In order to extract meaningful phrases from corpora (e. g. in an information retrieval context) intensive knowledge of the domain in question and the respective documents is generally needed. When moving to a new domain or language the underlying knowledge bases and models need to be adapted, which is often time-consuming and labor-intensive. This paper adresses the described challenge of phrase extraction from documents in different domains and languages and proposes an approach, which does not use comprehensive lexica and therefore can be easily transferred to new domains and languages. The effectiveness of the proposed approach is evaluated on user generated content and documents from the patent domain in English and German.
2010
pdf
bib
abs
Multilingual Corpus Development for Opinion Mining
Julia Maria Schulz
|
Christa Womser-Hacker
|
Thomas Mandl
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Opinion Mining is a discipline that has attracted some attention lately. Most of the research in this field has been done for English or Asian languages, due to the lack of resources in other languages. In this paper we describe an approach of building a manually annotated multilingual corpus for the domain of product reviews, which can be used as a basis for fine-grained opinion analysis also considering direct and indirect opinion targets. For each sentence in a review, the mentioned product features with their respective opinion polarity and strength on a scale from 0 to 3 are labelled manually by two annotators. The languages represented in the corpus are English, German and Spanish and the corpus consists of about 500 product reviews per language. After a short introduction and a description of related work, we illustrate the annotation process, including a description of the annotation methodology and the developed tool for the annotation process. Then first results on the inter-annotator agreement for opinions and product features are presented. We conclude the paper with an outlook on future work.
2008
pdf
bib
Analyzing Information Retrieval Results With a Focus on Named Entities
Thomas Mandl
|
Christa Womser-Hacker
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 1, March 2008: Special Issue on Cross-Lingual Information Retrieval and Question Answering
pdf
bib
abs
An Evaluation Resource for Geographic Information Retrieval
Thomas Mandl
|
Fredric Gey
|
Giorgio Di Nunzio
|
Nicola Ferro
|
Mark Sanderson
|
Diana Santos
|
Christa Womser-Hacker
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic information retrieval requires an evaluation resource which represents realistic information needs and which is geographically challenging. Some experimental results and analysis are reported
2002
pdf
bib
Inside the Evaluation Process of the Cross-Language Evaluation Forum (CLEF): Issues of Multilingual Topic Creation and Multilingual Relevance Assessment
Michael Kluck
|
Christa Womser-Hacker
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)