Kumiko Tanaka-Ishii

Also published as: Kumiko Tanaka


2020

pdf bib
Stock Embeddings Acquired from News Articles and Price History, and an Application to Portfolio Optimization
Xin Du | Kumiko Tanaka-Ishii
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Previous works that integrated news articles to better process stock prices used a variety of neural networks to predict price movements. The textual and price information were both encoded in the neural network, and it is therefore difficult to apply this approach in situations other than the original framework of the notoriously hard problem of price prediction. In contrast, this paper presents a method to encode the influence of news articles through a vector representation of stocks called a stock embedding. The stock embedding is acquired with a deep learning framework using both news articles and price history. Because the embedding takes the operational form of a vector, it is applicable to other financial problems besides price prediction. As one example application, we show the results of portfolio optimization using Reuters & Bloomberg headlines, producing a capital gain 2.8 times larger than that obtained with a baseline method using only stock price data. This suggests that the proposed stock embedding can leverage textual financial semantics to solve financial prediction problems.

2019

pdf bib
Evaluating Computational Language Models with Scaling Properties of Natural Language
Shuntaro Takahashi | Kumiko Tanaka-Ishii
Computational Linguistics, Volume 45, Issue 3 - September 2019

In this article, we evaluate computational models of natural language with respect to the universal statistical behaviors of natural language. Statistical mechanical analyses have revealed that natural language text is characterized by scaling properties, which quantify the global structure in the vocabulary population and the long memory of a text. We study whether five scaling properties (given by Zipf’s law, Heaps’ law, Ebeling’s method, Taylor’s law, and long-range correlation analysis) can serve for evaluation of computational models. Specifically, we test n-gram language models, a probabilistic context-free grammar, language models based on Simon/Pitman-Yor processes, neural language models, and generative adversarial networks for text generation. Our analysis reveals that language models based on recurrent neural networks with a gating mechanism (i.e., long short-term memory; a gated recurrent unit; and quasi-recurrent neural networks) are the only computational models that can reproduce the long memory behavior of natural language. Furthermore, through comparison with recently proposed model-based evaluation methods, we find that the exponent of Taylor’s law is a good indicator of model quality.

2018

pdf bib
Taylor’s law for Human Linguistic Sequences
Tatsuru Kobayashi | Kumiko Tanaka-Ishii
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Taylor’s law describes the fluctuation characteristics underlying a system in which the variance of an event within a time span grows by a power law with respect to the mean. Although Taylor’s law has been applied in many natural and social systems, its application for language has been scarce. This article describes a new way to quantify Taylor’s law in natural language and conducts Taylor analysis of over 1100 texts across 14 languages. We found that the Taylor exponents of natural language written texts exhibit almost the same value. The exponent was also compared for other language-related data, such as the child-directed speech, music, and programming languages. The results show how the Taylor exponent serves to quantify the fundamental structural complexity underlying linguistic time series. The article also shows the applicability of these findings in evaluating language models.

2016

pdf bib
Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora—
Ryosuke Takahira | Kumiko Tanaka-Ishii | Łukasz Dębowski
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

The article presents results of entropy rate estimation for human languages across six languages by using large, state-of-the-art corpora of up to 7.8 gigabytes. To obtain the estimates for data length tending to infinity, we use an extrapolation function given by an ansatz. Whereas some ansatzes of this kind were proposed in previous research papers, here we introduce a stretched exponential extrapolation function that has a smaller error of fit. In this way, we uncover a possibility that the entropy rates of human languages are positive but 20% smaller than previously reported.

2015

pdf bib
Computational Constancy Measures of Texts—Yule’s K and Rényi’s Entropy
Kumiko Tanaka-Ishii | Shunsuke Aihara
Computational Linguistics, Volume 41, Issue 3 - September 2015

2012

pdf bib
Text Segmentation by Language Using Minimum Description Length
Hiroshi Yamaguchi | Kumiko Tanaka-Ishii
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Verb Temporality Analysis using Reichenbach’s Tense System
André Horie | Kumiko Tanaka-Ishii | Mitsuru Ishizuka
Proceedings of COLING 2012: Posters

2011

pdf bib
Relational Lasso —An Improved Method Using the Relations Among Features—
Kotaro Kitagawa | Kumiko Tanaka-Ishii
Proceedings of 5th International Joint Conference on Natural Language Processing

2010

pdf bib
Tree-Based Deterministic Dependency Parsing — An Application to Nivre’s Method —
Kotaro Kitagawa | Kumiko Tanaka-Ishii
Proceedings of the ACL 2010 Conference Short Papers

pdf bib
YouBot: A Simple Framework for Building Virtual Networking Agents
Seiji Takegata | Kumiko Tanaka-Ishii
Proceedings of the SIGDIAL 2010 Conference

pdf bib
Sorting Texts by Readability
Kumiko Tanaka-Ishii | Satoshi Tezuka | Hiroshi Terada
Computational Linguistics, Volume 36, Number 2, June 2010

2009

pdf bib
Multilingual Spectral Clustering Using Document Similarity Propagation
Dani Yogatama | Kumiko Tanaka-Ishii
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf bib
Multilingual Text Entry using Automatic Language Detection
Yo Ehara | Kumiko Tanaka-Ishii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2006

pdf bib
Unsupervised Segmentation of Chinese Text by Use of Branching Entropy
Zhihui Jin | Kumiko Tanaka-Ishii
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
Entropy as an Indicator of Context Boundaries: An Experiment Using a Web Search Engine
Kumiko Tanaka-Ishii
Second International Joint Conference on Natural Language Processing: Full Papers

2003

pdf bib
Acquiring Vocabulary for Predictive Text Entry through Dynamic Reuse of a Small User Corpus
Kumiko Tanaka-Ishii | Daichi Hayakawa | Masato Takeichi
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics

pdf bib
Kiwi: A Multilingual Usage Consultation Tool based on Internet Searching
Kumiko Tanaka-Ishii | Masato Yamamoto | Hiroshi Nakagawa
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics

2002

pdf bib
Entering Text with a Four-Button Device
Kumiko Tanaka-Ishii | Yusuke Inutsuka | Masato Takeichi
COLING 2002: The 19th International Conference on Computational Linguistics

2001

pdf bib
Japanese Text Input System With Digits
Kumiko Tanaka-Ishii | Yusuke Inutsuka | Masato Takeichi
Proceedings of the First International Conference on Human Language Technology Research

2000

pdf bib
Multi-Agent Explanation Strategies in Real-Time Domains
Kumiko Tanaka-Ishii | Ian Frank
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
Reactive Content Selection in the Generation of Real-time Soccer Commentary
Kumiko Tanaka-Ishii | Koiti Hasida | Itsuki Noda
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

pdf bib
Reactive Content Selection in the Generation of Real-time Soccer Commentary
Kumiko Tanaka-Ishii | Koiti Hasida | Itsuki Noda
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

1997

pdf bib
Clustering Co-occurrence Graph based on Transitivity
Kumiko Tanaka-Ishii
Fifth Workshop on Very Large Corpora

1996

pdf bib
Extraction of Lexical Translations from Non-Aligned Corpora
Kumiko Tanaka | Hideya Iwasaki
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf bib
Construction of a Bilingual Dictionary Intermediated by a Third Language
Kumiko Tanaka | Kyoji Umemura
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics