Johann Haller


2005

pdf bib
Sentiment Analysis for Issues Monitoring Using Linguistic Resources
Ecaterina Rascu | Kai Schirmer | Johann Haller
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

Sentiment analysis dealing with the identification and evaluation of opinions towards a topic, a company, or a product is an essential task within media analysis. It is used to study trends, determine the level of customer satisfaction, or warn immediately when unfavourable trends risk damaging the image of a company. In this paper we present an issues monitoring system which, besides text categorization, also performs an extensive sentiment analysis of online news and newsgroup postings. Input texts undergo a morpho-syntactic analysis, are indexed using a thesaurus and are categorized into user-specific classes. During sentiment analysis, sentiment expressions are identified and subsequently associated with the established topics. After presenting the various components of the system and the linguistic resources used, we describe in detail SentA, its sentiment analysis component, and evaluate its performance.

2004

pdf bib
Using Weighted Abduction to Align Term Variant Translations in Bilingual Texts
Michael Carl | Ecaterina Rascu | Johann Haller
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
Controlling Gender Equality with Shallow NLP Techniques
Michael Carl | Sandrine Garnier | Johann Haller | Anne Altmayer | Bärbel Miemietz
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Application of corpus-based techniques to Amharic texts
Sisay Fissaha | Johann Haller
Workshop on Machine Translation for Semitic languages: issues and approaches

A number of corpus-based techniques have been used in the development of natural language processing application. One area in which these techniques have extensively been applied is lexical development. The current work is being undertaken in the context of a machine translation project in which lexical development activities constitute a significant portion of the overall task. In the first part, we applied corpus-based techniques to the extraction of collocations from Amharic text corpus. Analysis of the output reveals important collocations that can usefully be incorporated in the lexicon. This is especially true for the extraction of idiomatic expressions. The patterns of idiom formation which are observed in a small manually collected data enabled extraction of large set of idioms which otherwise may be difficult or impossible to recognize. Furthermore, preliminary results of other corpus-based techniques, that is, clustering and classification, that are currently being under investigation are presented. The results show that clustering performed no better than the frequency base line whereas classification showed a clear performance improvement over the frequency base line. This in turn suggests the need to carry out further experiments using large sets of data and more contextual information.

1996

pdf bib
Multilint - a Technical Documentation System with Multilingual Intelligence
Johann Haller
Proceedings of Translating and the Computer 18

1994

pdf bib
Machine translation, ten years on: Discourse has yet to make a breakthrough
Ruslan Mitkov | Johann Haller
Proceedings of the Second International Conference on Machine Translation: Ten years on