Huan-An Kao


2010

pdf bib
Comment Extraction from Blog Posts and Its Applications to Opinion Mining
Huan-An Kao | Hsin-Hsi Chen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Blog posts containing many personal experiences or perspectives toward specific subjects are useful. Blogs allow readers to interact with bloggers by placing comments on specific blog posts. The comments carry viewpoints of readers toward the targets described in the post, or supportive/non-supportive attitude toward the post. Comment extraction is challenging due to that there does not exist a unique template among all blog service providers. This paper proposes methods to deal with this problem. Firstly, the repetitive patterns and their corresponding blocks are extracted from input posts by pattern identification algorithm. Secondly, three filtering strategies, i.e., tag pattern loop filtering, rule overlap filtering, and longest rule first, are used to remove non-comment blocks. Finally, a comment/non-comment classifier is learned to distinguish comment blocks from non-comment blocks with 14 block-level features and 5 rule-level features. In the experiments, we randomly select 600 blog posts from 12 blog service providers. F-measure, recall, and precision are 0.801, 0.855, and 0.780, respectively, by using all of the three filtering strategies together with some selected features. The application of comment extraction to blog mining is also illustrated. We show how to identify the relevant opinionated objects ― say, opinion holders, opinions, and targets, from posts.