Kumiko Tanaka-Ishii
Also published as: Kumiko Tanaka
2025
A New Formulation of Zipf’s Meaning-Frequency Law through Contextual Diversity
Ryo Nagata | Kumiko Tanaka-Ishii
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ryo Nagata | Kumiko Tanaka-Ishii
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper proposes formulating Zipf’s meaning-frequency law, the power law between word frequency and the number of meanings, as a relationship between word frequency and contextual diversity. The proposed formulation quantifies meaning counts as contextual diversity, which is based on the directions of contextualized word vectors obtained from a Language Model (LM). This formulation gives a new interpretation to the law and also enables us to examine it for a wider variety of words and corpora than previous studies have explored. In addition, this paper shows that the law becomes unobservable when the size of the LM used is small and that autoregressive LMs require much more parameters than masked LMs to be able to observe the law.
2020
Stock Embeddings Acquired from News Articles and Price History, and an Application to Portfolio Optimization
Xin Du | Kumiko Tanaka-Ishii
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Xin Du | Kumiko Tanaka-Ishii
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Previous works that integrated news articles to better process stock prices used a variety of neural networks to predict price movements. The textual and price information were both encoded in the neural network, and it is therefore difficult to apply this approach in situations other than the original framework of the notoriously hard problem of price prediction. In contrast, this paper presents a method to encode the influence of news articles through a vector representation of stocks called a stock embedding. The stock embedding is acquired with a deep learning framework using both news articles and price history. Because the embedding takes the operational form of a vector, it is applicable to other financial problems besides price prediction. As one example application, we show the results of portfolio optimization using Reuters & Bloomberg headlines, producing a capital gain 2.8 times larger than that obtained with a baseline method using only stock price data. This suggests that the proposed stock embedding can leverage textual financial semantics to solve financial prediction problems.
2019
Evaluating Computational Language Models with Scaling Properties of Natural Language
Shuntaro Takahashi | Kumiko Tanaka-Ishii
Computational Linguistics, Volume 45, Issue 3 - September 2019
Shuntaro Takahashi | Kumiko Tanaka-Ishii
Computational Linguistics, Volume 45, Issue 3 - September 2019
In this article, we evaluate computational models of natural language with respect to the universal statistical behaviors of natural language. Statistical mechanical analyses have revealed that natural language text is characterized by scaling properties, which quantify the global structure in the vocabulary population and the long memory of a text. We study whether five scaling properties (given by Zipf’s law, Heaps’ law, Ebeling’s method, Taylor’s law, and long-range correlation analysis) can serve for evaluation of computational models. Specifically, we test n-gram language models, a probabilistic context-free grammar, language models based on Simon/Pitman-Yor processes, neural language models, and generative adversarial networks for text generation. Our analysis reveals that language models based on recurrent neural networks with a gating mechanism (i.e., long short-term memory; a gated recurrent unit; and quasi-recurrent neural networks) are the only computational models that can reproduce the long memory behavior of natural language. Furthermore, through comparison with recently proposed model-based evaluation methods, we find that the exponent of Taylor’s law is a good indicator of model quality.
2018
Taylor’s law for Human Linguistic Sequences
Tatsuru Kobayashi | Kumiko Tanaka-Ishii
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tatsuru Kobayashi | Kumiko Tanaka-Ishii
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Taylor’s law describes the fluctuation characteristics underlying a system in which the variance of an event within a time span grows by a power law with respect to the mean. Although Taylor’s law has been applied in many natural and social systems, its application for language has been scarce. This article describes a new way to quantify Taylor’s law in natural language and conducts Taylor analysis of over 1100 texts across 14 languages. We found that the Taylor exponents of natural language written texts exhibit almost the same value. The exponent was also compared for other language-related data, such as the child-directed speech, music, and programming languages. The results show how the Taylor exponent serves to quantify the fundamental structural complexity underlying linguistic time series. The article also shows the applicability of these findings in evaluating language models.
2016
Upper Bound of Entropy Rate Revisited —A New Extrapolation of Compressed Large-Scale Corpora—
Ryosuke Takahira | Kumiko Tanaka-Ishii | Łukasz Dębowski
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
Ryosuke Takahira | Kumiko Tanaka-Ishii | Łukasz Dębowski
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)
The article presents results of entropy rate estimation for human languages across six languages by using large, state-of-the-art corpora of up to 7.8 gigabytes. To obtain the estimates for data length tending to infinity, we use an extrapolation function given by an ansatz. Whereas some ansatzes of this kind were proposed in previous research papers, here we introduce a stretched exponential extrapolation function that has a smaller error of fit. In this way, we uncover a possibility that the entropy rates of human languages are positive but 20% smaller than previously reported.
2015
Computational Constancy Measures of Texts—Yule’s K and Rényi’s Entropy
Kumiko Tanaka-Ishii | Shunsuke Aihara
Computational Linguistics, Volume 41, Issue 3 - September 2015
Kumiko Tanaka-Ishii | Shunsuke Aihara
Computational Linguistics, Volume 41, Issue 3 - September 2015
2012
Verb Temporality Analysis using Reichenbach’s Tense System
André Horie | Kumiko Tanaka-Ishii | Mitsuru Ishizuka
Proceedings of COLING 2012: Posters
André Horie | Kumiko Tanaka-Ishii | Mitsuru Ishizuka
Proceedings of COLING 2012: Posters
Text Segmentation by Language Using Minimum Description Length
Hiroshi Yamaguchi | Kumiko Tanaka-Ishii
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Hiroshi Yamaguchi | Kumiko Tanaka-Ishii
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2011
Relational Lasso —An Improved Method Using the Relations Among Features—
Kotaro Kitagawa | Kumiko Tanaka-Ishii
Proceedings of 5th International Joint Conference on Natural Language Processing
Kotaro Kitagawa | Kumiko Tanaka-Ishii
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
YouBot: A Simple Framework for Building Virtual Networking Agents
Seiji Takegata | Kumiko Tanaka-Ishii
Proceedings of the SIGDIAL 2010 Conference
Seiji Takegata | Kumiko Tanaka-Ishii
Proceedings of the SIGDIAL 2010 Conference
Tree-Based Deterministic Dependency Parsing — An Application to Nivre’s Method —
Kotaro Kitagawa | Kumiko Tanaka-Ishii
Proceedings of the ACL 2010 Conference Short Papers
Kotaro Kitagawa | Kumiko Tanaka-Ishii
Proceedings of the ACL 2010 Conference Short Papers
Sorting Texts by Readability
Kumiko Tanaka-Ishii | Satoshi Tezuka | Hiroshi Terada
Computational Linguistics, Volume 36, Number 2, June 2010
Kumiko Tanaka-Ishii | Satoshi Tezuka | Hiroshi Terada
Computational Linguistics, Volume 36, Number 2, June 2010
2009
Multilingual Spectral Clustering Using Document Similarity Propagation
Dani Yogatama | Kumiko Tanaka-Ishii
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
Dani Yogatama | Kumiko Tanaka-Ishii
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing
2008
Multilingual Text Entry using Automatic Language Detection
Yo Ehara | Kumiko Tanaka-Ishii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I
Yo Ehara | Kumiko Tanaka-Ishii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I
2006
Unsupervised Segmentation of Chinese Text by Use of Branching Entropy
Zhihui Jin | Kumiko Tanaka-Ishii
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
Zhihui Jin | Kumiko Tanaka-Ishii
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions
2005
Entropy as an Indicator of Context Boundaries: An Experiment Using a Web Search Engine
Kumiko Tanaka-Ishii
Second International Joint Conference on Natural Language Processing: Full Papers
Kumiko Tanaka-Ishii
Second International Joint Conference on Natural Language Processing: Full Papers
2003
Kiwi: A Multilingual Usage Consultation Tool based on Internet Searching
Kumiko Tanaka-Ishii | Masato Yamamoto | Hiroshi Nakagawa
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics
Kumiko Tanaka-Ishii | Masato Yamamoto | Hiroshi Nakagawa
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics
Acquiring Vocabulary for Predictive Text Entry through Dynamic Reuse of a Small User Corpus
Kumiko Tanaka-Ishii | Daichi Hayakawa | Masato Takeichi
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
Kumiko Tanaka-Ishii | Daichi Hayakawa | Masato Takeichi
Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics
2002
Entering Text with a Four-Button Device
Kumiko Tanaka-Ishii | Yusuke Inutsuka | Masato Takeichi
COLING 2002: The 19th International Conference on Computational Linguistics
Kumiko Tanaka-Ishii | Yusuke Inutsuka | Masato Takeichi
COLING 2002: The 19th International Conference on Computational Linguistics
2001
Japanese Text Input System With Digits
Kumiko Tanaka-Ishii | Yusuke Inutsuka | Masato Takeichi
Proceedings of the First International Conference on Human Language Technology Research
Kumiko Tanaka-Ishii | Yusuke Inutsuka | Masato Takeichi
Proceedings of the First International Conference on Human Language Technology Research
2000
Multi-Agent Explanation Strategies in Real-Time Domains
Kumiko Tanaka-Ishii | Ian Frank
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics
Kumiko Tanaka-Ishii | Ian Frank
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics
1998
Reactive Content Selection in the Generation of Real-time Soccer Commentary
Kumiko Tanaka-Ishii | Koiti Hasida | Itsuki Noda
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2
Kumiko Tanaka-Ishii | Koiti Hasida | Itsuki Noda
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2
Reactive Content Selection in the Generation of Real-time Soccer Commentary
Kumiko Tanaka-Ishii | Koiti Hasida | Itsuki Noda
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics
Kumiko Tanaka-Ishii | Koiti Hasida | Itsuki Noda
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics
1997
Clustering Co-occurrence Graph based on Transitivity
Kumiko Tanaka-Ishii
Fifth Workshop on Very Large Corpora
Kumiko Tanaka-Ishii
Fifth Workshop on Very Large Corpora
1996
Extraction of Lexical Translations from Non-Aligned Corpora
Kumiko Tanaka | Hideya Iwasaki
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics
Kumiko Tanaka | Hideya Iwasaki
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics
1994
Search
Fix author
Co-authors
- Masato Takeichi 3
- Koiti Hasida 2
- Yusuke Inutsuka 2
- Kotaro Kitagawa 2
- Itsuki Noda 2
- Shunsuke Aihara 1
- Xin Du 1
- Łukasz Dębowski 1
- Yo Ehara 1
- Ian Frank 1
- Daichi Hayakawa 1
- André Horie 1
- Mitsuru Ishizuka 1
- Hideya Iwasaki 1
- Zhihui Jin 1
- Tatsuru Kobayashi 1
- Ryo Nagata 1
- Hiroshi Nakagawa 1
- Shuntaro Takahashi 1
- Ryosuke Takahira 1
- Seiji Takegata 1
- Hiroshi Terada 1
- Satoshi Tezuka 1
- Kyoji Umemura 1
- Hiroshi Yamaguchi 1
- Masato Yamamoto 1
- Dani Yogatama 1