Kosin Chamnongthai
2006
Semantic-Based Keyword Recovery Function for Keyword Extraction System
Rachada Kongkachandra
|
Kosin Chamnongthai
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The goal of implementing a keyword extraction system is to increase as near as 100% of precision and recall. These values are affected by the amount of extracted keywords. There are two groups of errors happened i.e. false-rejected and false-accepted keywords. To improve the performance of the system, false-rejected keywords should be recovered and the false-accepted keywords should be reduced. In this paper, we enhance the conventional keyword extraction systems by attaching the keyword recovery function. This function recovers the previously false-rejected keywords by comparing their semantic information with the contents of each relevant document. The function is automated in three processes i.e. Domain Identification, Knowledge Base Generation and Keyword Determination. Domain identification process identifies domain of interest by searching domains from domain knowledge base by using extracted keywords. The most general domains are selected and then used subsequently. To recover the false-rejected keywords, we match them with keywords in the identified domain within the domain knowledge base rely on their semantics by keyword determination process. To semantically recover keywords, definitions of false-reject keywords and domain knowledge base are previously represented in term of conceptual graph by knowledge base generator process. To evaluate the performance of the proposed function, EXTRACTOR, KEA and our keyword-database-mapping based keyword extractor are compared. The experiments were performed in two modes i.e. training and recovering. In training mode, we use four glossaries from the Internet and 60 articles from the summary sections of IEICE transaction. While in the recovering mode, 200 texts from three resources i.e. summary section of 15 chapters in a computer textbook and articles from IEICE and ACM transactions are used. The experimental results revealed that our proposed function improves the precision and recall rates of the conventional keyword extraction systems approximately 3-5% of precision and 6-10% of recall, respectively.