Identifying Strategic Information from Scientific Articles through Sentence Classification
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
We address here the need to assist users in rapidly accessing the most important or strategic information in the text corpus by identifying sentences carrying specific information. More precisely, we want to identify contribution of authors of scientific papers through a categorization of sentences using rhetorical and lexical cues. We built local grammars to annotate sentences in the corpus according to their rhetorical status: objective, new things, results, findings, hypotheses, conclusion, related_word, future work. The annotation is automatically projected automatically onto two other corpora to test their portability across several domains. The local grammars are implemented in the Unitex system. After sentence categorization, the annotated sentences are clustered and users can navigate the result by accessing specific information types. The results can be used for advanced information retrieval purposes.
A task-oriented framework for evaluating theme detection systems: A discussion paper
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
This paper discusses the inherent difficulties in evaluating systems for theme detection. Such systems are based essentially on unsupervised clustering aiming to discover the underlying structure in a corpus of texts. As the structures are precisely unknown beforehand, it is difficult to devise a satisfactory evaluation protocol. Several problems are posed by cluster evaluation: determining the optimal number of clusters, cluster content evaluation, topology of the discovered structure. Each of these problems has been studied separately but some of the proposed metrics portray significant flaws. Moreover, no benchmark has been commonly agreed upon. Finally, it is necessary to distinguish between task-oriented and activity-oriented evaluation as the two frameworks imply different evaluation protocols. Possible solutions to the activity-oriented evaluation can be sought from the data and text mining communities.
Complex Structuring of Term Variants for Question Answering
Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment
Terminological variation, a means of identifying research topics from texts
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics
Terminological Variation, a Means of Identifying Research Topics from Texts
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1