A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons

Satoshi Sato, Sayoko Kaide


Abstract
This paper proposes a simple and fast person-name filter, which plays an important role in automatic compilation of a large bilingual person-name lexicon. This filter is based on pn_score, which is the sum of two component scores, the score of the first name and that of the last name. Each score is calculated from two term sets: one is a dense set in which most of the members are person names; another is a baseline set that contains less person names. The pn_score takes one of five values, {+2, +1, 0, -1, -2}, which correspond to strong positive, positive, undecidable, negative, and strong negative, respectively. This pn_score can be easily extended to bilingual pn_score that takes one of nine values, by summing scores of two languages. Experimental results show that our method works well for monolingual person names in English and Japanese; the F-score of each language is 0.929 and 0.939, respectively. The performance of the bilingual person-name filter is better; the F-score is 0.955.
Anthology ID:
L10-1235
Volume:
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Month:
May
Year:
2010
Address:
Valletta, Malta
Editors:
Nicoletta Calzolari, Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Mike Rosner, Daniel Tapias
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/343_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Satoshi Sato and Sayoko Kaide. 2010. A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta. European Language Resources Association (ELRA).
Cite (Informal):
A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons (Sato & Kaide, LREC 2010)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2010/pdf/343_Paper.pdf