Hidekazu Oiwa


A Comparison of Entity Matching Methods between English and Japanese Katakana
Michiharu Yamashita | Hideki Awashima | Hidekazu Oiwa
Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology

Japanese Katakana is one component of the Japanese writing system and is used to express English terms, loanwords, and onomatopoeia in Japanese characters based on the phonemes. The main purpose of this research is to find the best entity matching methods between English and Katakana. We built two research questions to clarify which types of entity matching systems works better than others. The first question is what transliteration should be used for conversion. We need to transliterate English or Katakana terms into the same form in order to compute the string similarity. We consider five conversions that transliterate English to Katakana directly, Katakana to English directly, English to Katakana via phoneme, Katakana to English via phoneme, and both English and Katakana to phoneme. The second question is what should be used for the similarity measure at entity matching. To investigate the problem, we choose six methods, which are Overlap Coefficient, Cosine, Jaccard, Jaro-Winkler, Levenshtein, and the similarity of the phoneme probability predicted by RNN. Our results show that 1) matching using phonemes and conversion of Katakana to English works better than other methods, and 2) the similarity of phonemes outperforms other methods while other similarity score is changed depending on data and models.


Formalizing Word Sampling for Vocabulary Prediction as Graph-based Active Learning
Yo Ehara | Yusuke Miyao | Hidekazu Oiwa | Issei Sato | Hiroshi Nakagawa
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Common Space Embedding of Primal-Dual Relation Semantic Spaces
Hidekazu Oiwa | Jun’ichi Tsujii
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers


Understanding seed selection in bootstrapping
Yo Ehara | Issei Sato | Hidekazu Oiwa | Hiroshi Nakagawa
Proceedings of TextGraphs-8 Graph-based Methods for Natural Language Processing


Mining Words in the Minds of Second Language Learners: Learner-Specific Word Difficulty
Yo Ehara | Issei Sato | Hidekazu Oiwa | Hiroshi Nakagawa
Proceedings of COLING 2012