Atsushi Fujii

Reflecting the rapid growth of science, technology, and culture, it has become common practice to consult tools on the World Wide Web for various terms. Existing search engines provide an enormous volume of information, but retrieved information is not organized. Hand-compiled encyclopedias provide organized information, but the quantity of information is limited. In this paper, aiming to integrate the advantages of both tools, we propose a method to organize a search result based on multiple viewpoints as in Wikipedia. Because viewpoints required for explanation are different depending on the type of a term, such as animal and disease, we model articles in Wikipedia to extract a viewpoint structure for each term type. To identify a set of term types, we independently use manual annotation and automatic document clustering for Wikipedia articles. We also propose an effective feature for clustering of Wikipedia articles. We experimentally show that the document clustering reduces the cost for the manual annotation while maintaining the accuracy for modeling Wikipedia articles.

pdf bib

Enhancing Lemmatization for Mongolian and its Application to Statistical Machine Translation
Chimeddorj Odbayar | Atsushi Fujii
Proceedings of the 10th Workshop on Asian Language Resources

2010

pdf bib abs

Modeling Wikipedia Articles to Enhance Encyclopedic Search
Atsushi Fujii
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Reflecting the rapid growth of science, technology, and culture, it has become common practice to consult tools on the World Wide Web for various terms. Existing search engines provide an enormous volume of information, but retrieved information is not organized. Hand-compiled encyclopedias provide organized information, but the quantity of information is limited. To integrate the advantages of both tools, we have been proposing methods for encyclopedic search targeting information on the Web and patent information. In this paper, we propose a method to categorize multiple expository texts for a single term based on viewpoints. Because viewpoints required for explanation are different depending on the type of a term, such as animals and diseases, it is difficult to manually produce a large scale system. We use Wikipedia to extract a prototype of a viewpoint structure for each term type. We also use articles in Wikipedia for a machine learning method, which categorizes a given text into an appropriate viewpoint. We evaluate the effectiveness of our method experimentally.

2009

pdf bib

pdf bib

Exploiting Patent Information for the Evaluation of Machine Translation
Atsushi Fujii | Masao Utiyama | Mikio Yamamoto | Takehito Utsuro
Proceedings of the Third Workshop on Patent Translation

2008

pdf bib

Statistical Machine Translation based Passage Retrieval for Cross-Lingual Question Answering
Tomoyosi Akiba | Kei Shimizu | Atsushi Fujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib

A Lemmatization Method for Modern Mongolian and its Application to Information Retrieval
Badam-Osor Khaltar | Atsushi Fujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib abs

Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop
Atsushi Fujii | Masao Utiyama | Mikio Yamamoto | Takehito Utsuro
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In aiming at research and development on machine translation, we produced a test collection for Japanese-English machine translation in the seventh NTCIR Workshop. This paper describes details of our test collection. From patent documents published in Japan and the United States, we extracted patent families as a parallel corpus. A patent family is a set of patent documents for the same or related invention and these documents are usually filed to more than one country in different languages. In the parallel corpus, we aligned Japanese sentences with their counterpart English sentences. Our test collection, which includes approximately 2,000,000 sentence pairs, can be used to train and test machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval and the contribution of machine translation to a patent retrieval task can also be evaluated. Our test collection will be available to the public for research purposes after the NTCIR final meeting.

pdf bib abs

Producing an Encyclopedic Dictionary using Patent Documents
Atsushi Fujii
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Although the World Wide Web has late become an important source to consult for the meaning of words, a number of technical terms related to high technology are not found on the Web. This paper describes a method to produce an encyclopedic dictionary for high-tech terms from patent information. We used a collection of unexamined patent applications published by the Japanese Patent Office as a source corpus. Given this collection, we extracted terms as headword candidates and retrieved applications including those headwords. Then, we extracted paragraph-style descriptions and categorized them into technical domains. We also extracted related terms for each headword. We have produced a dictionary including approximately 400,000 Japanese terms as headwords. We have also implemented an interface with which users can explore our dictionary by reading text descriptions and viewing a related-term graph.

pdf bib abs

Toward the Evaluation of Machine Translation Using Patent Information
Atsushi Fujii | Masao Utiyama | Mikio Yamamoto | Takehito Utsuro
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers

To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2000000 sentence pairs in Japanese and English, which were extracted automatically from our parallel corpus. These sentence pairs can be used to train and evaluate machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval, which can be used to evaluate the contribution of machine translation to retrieving patent documents across languages. This paper describes our test collection, methods for evaluating machine translation, and preliminary experiments.

pdf bib

Effects of Related Term Extraction in Transliteration into Chinese
HaiXiang Huang | Atsushi Fujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

2006

pdf bib

A System for Summarizing and Visualizing Arguments in Subjective Documents: Toward Supporting Decision Making
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf bib abs

Test Collections for Patent Retrieval and Patent Classification in the Fifth NTCIR Workshop
Atsushi Fujii | Makoto Iwayama | Noriko Kando
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the test collections produced for the Patent Retrieval Task in the Fifth NTCIR Workshop. We performed the invalidity search task, in which each participant group searches a patent collection for the patents that can invalidate the demand in an existing claim. For this purpose, we performed both document and passage retrieval tasks. We also performed the automatic patent classification task using the F-term classification system. The test collections will be available to the public for research purposes.

pdf bib

Extracting Loanwords from Mongolian Corpora and Producing a Japanese-Mongolian Bilingual Dictionary
Badam-Osor Khaltar | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib

Modeling Impression in Probabilistic Transliteration into Chinese
LiLi Xu | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing

pdf bib abs

Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus
Yasunori Ohishi | Katunobu Itou | Kazuya Takeda | Atsushi Fujii
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper proposes a discrimination method for hierarchical relationsbetween word pairs. The method is a statistical one using an encyclopedic corpus' extracted and organized from Web pages. In the proposed method, we use the statistical naturethat hyponyms' descriptionstend to include hypernyms whereas hypernyms' descriptions do notinclude all of the hyponyms.Experimental results show that the method detected 61.7% of therelations in an actual thesaurus.

2004

pdf bib

Summarizing Encyclopedic Term Descriptions on the Web
Atsushi Fujii | Tetsuya Ishikawa
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib

Term Extraction from Korean Corpora via Japanese
Atsushi Fujii | Tetsuya Ishikawa | Jong-Hyeok Lee
Proceedings of CompuTerm 2004: 3rd International Workshop on Computational Terminology

pdf bib

Collecting Spontaneously Spoken Queries for Information Retrieval
Tomoyosi Akiba | Atsushi Fujii | Katunobu Itou
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib

Test Collections for Patent-to-Patent Retrieval and Patent Map Generation in NTCIR-4 Workshop
Atsushi Fujii | Makoto Iwayama | Noriko Kando
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib

Overview of Patent Retrieval Task at NTCIR-3
Makoto Iwayama | Atsushi Fujii | Noriko Kando | Akihiko Takano
Proceedings of the ACL-2003 Workshop on Patent Corpus Processing

pdf bib abs

A system for Japanese/English/Korean multilingual patent retrieval
Mitsuharu Makita | Shigeto Higuchi | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of Machine Translation Summit IX: System Presentations

In response to growing needs for cross-lingual patent retrieval, we propose PRIME (Patent Retrieval In Multilingual Environment system), in which users can retrieve and browse patents in foreign languages only by their native language. PRIME translates a query in the user language into the target language, retrieves patents relevant to the query, and translates retrieved patents into the user language. To update a translation dictionary, PRIME automatically extracts new translations from parallel patent corpora. In the current implementation, trilingual (J/E/K) patent retrieval is available. We describe the system design and its evaluation.

2002

pdf bib

A Method for Open-Vocabulary Speech-Driven Text Retrieval
Atsushi Fujii | Katunobu Itou | Tetsuya Ishikawa
Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002)

pdf bib

A Probabilistic Method for Analyzing Japanese Anaphora Integrating Zero Pronoun Detection and Resolution
Kazuhiro Seki | Atsushi Fujii | Tetsuya Ishikawa
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib

Producing a Large-scale Encyclopedic Corpus over the Web
Atsushi Fujii | Katunobu Itou | Tetsuya Ishikawa
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib

PRIME: a system for multi-lingual patent retrieval
Shigeto Higuchi | Masatoshi Fukui | Atsushi Fujii | Tetsuya Ishikawa
Proceedings of Machine Translation Summit VIII

pdf bib

Question Answering Using Encyclopedic Knowledge Generated from the Web
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the ACL 2001 Workshop on Open-Domain Question Answering

pdf bib

Organizing Encyclopedic Knowledge based on the Web and its Application to Question Answering
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics

2000

pdf bib abs

Applying machine translation to two-stage cross-language information retrieval
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers

Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.

pdf bib

Utilizing the World Wide Web as an Encyclopedia: Extracting Term Descriptions from Semi-Structured Texts
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf bib

A Novelty-based Evaluation Method for Information Retrieval
Atsushi Fujii | Tetsuya Ishikawa
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)