Automatic Term Recognition Based on the Statistical Differences of Relative Frequencies in Different Corpora

Junko Kubo; Keita Tsuji; Shigeo Sugimoto

Automatic Term Recognition Based on the Statistical Differences of Relative Frequencies in Different Corpora

Junko Kubo, Keita Tsuji, Shigeo Sugimoto

Abstract

In this paper, we propose a method for automatic term recognition (ATR) which uses the statistical differences of relative frequencies of terms in target domain corpus and elsewhere. Generally, the target terms appear more frequently in target domain corpus than in other domain corpora. Utilizing such characteristics will lead to the improvement of extraction performance. Most of the ATR methods proposed so far only use the target domain corpus and do not take such characteristics into account. For the extraction experiment, we used the abstracts of a women's studies journal as a target domain corpus and those of academic journals of 39 domains as other domain corpora. The women's studies terms which were used for extraction evaluation were manually identified terms in the abstracts. The extraction performance was analyzed and we found that our method outperformed earlier methods. The previous methods were based on C-value, FLR and methods which were also used with other domain corpora.

Anthology ID:: L10-1239
Volume:: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:: May
Year:: 2010
Address:: Valletta, Malta
Editors:: Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:: LREC
SIG:
Publisher:: European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/347_Paper.pdf
DOI:
Bibkey:
Cite (ACL):: Junko Kubo, Keita Tsuji, and Shigeo Sugimoto. 2010. Automatic Term Recognition Based on the Statistical Differences of Relative Frequencies in Different Corpora. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):: Automatic Term Recognition Based on the Statistical Differences of Relative Frequencies in Different Corpora (Kubo et al., LREC 2010)
Copy Citation:
PDF:: http://www.lrec-conf.org/proceedings/lrec2010/pdf/347_Paper.pdf

PDF Cite Search Fix data