Takamasa Oshikiri


2017

pdf bib
Segmentation-Free Word Embedding for Unsegmented Languages
Takamasa Oshikiri
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

In this paper, we propose a new pipeline of word embedding for unsegmented languages, called segmentation-free word embedding, which does not require word segmentation as a preprocessing step. Unlike space-delimited languages, unsegmented languages, such as Chinese and Japanese, require word segmentation as a preprocessing step. However, word segmentation, that often requires manually annotated resources, is difficult and expensive, and unavoidable errors in word segmentation affect downstream tasks. To avoid these problems in learning word vectors of unsegmented languages, we consider word co-occurrence statistics over all possible candidates of segmentations based on frequent character n-grams instead of segmented sentences provided by conventional word segmenters. Our experiments of noun category prediction tasks on raw Twitter, Weibo, and Wikipedia corpora show that the proposed method outperforms the conventional approaches that require word segmenters.

pdf bib
Spectral Graph-Based Method of Multimodal Word Embedding
Kazuki Fukui | Takamasa Oshikiri | Hidetoshi Shimodaira
Proceedings of TextGraphs-11: the Workshop on Graph-based Methods for Natural Language Processing

In this paper, we propose a novel method for multimodal word embedding, which exploit a generalized framework of multi-view spectral graph embedding to take into account visual appearances or scenes denoted by words in a corpus. We evaluated our method through word similarity tasks and a concept-to-image search task, having found that it provides word representations that reflect visual information, while somewhat trading-off the performance on the word similarity tasks. Moreover, we demonstrate that our method captures multimodal linguistic regularities, which enable recovering relational similarities between words and images by vector arithmetics.

2016

pdf bib
Cross-Lingual Word Representations via Spectral Graph Embeddings
Takamasa Oshikiri | Kazuki Fukui | Hidetoshi Shimodaira
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)