David Cheng


2006

pdf bib
Sentiments on a Grid: Analysis of Streaming News and Views
Khurshid Ahmad | Lee Gillam | David Cheng
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we report on constructing a finite state automaton comprising automatically extracted terminology and significant collocation patterns from a training corpus of specialist news (Reuters Financial News). The automaton can be used to unambiguously identify sentiment-bearing words that might be able to make or break people, companies, perhaps even governments. The paper presents the emerging face of corpus linguistics where a corpus is used to bootstrap both the terminology and the significant meaning bearing patterns from the corpus. Much of the current content analysis software systems require a human coder to eyeball terms and sentiment words. Such an approach might yield very good quality results on small text collections but when confronted with a 40-50 million word corpus such an approach does not scale, and a large-scale computer-based approach is required. We report on the use of Grid computing technologies and techniques to cope with this analysis.

2004

pdf bib
Text Corpora, Local Grammars and Prediction
Hayssam Traboulsi | David Cheng | Khurshid Ahmad
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)