2024
pdf
bib
abs
Computational Complexity of Natural Morphology Revisited
Hajime Senuma
|
Akiko Aizawa
Transactions of the Association for Computational Linguistics, Volume 12
This paper revisits a classical, yet fundamental, discussion of theoretical computational linguistics: the computational complexity of natural languages. Past studies have revealed that syntax, as observed in Swiss-German, is not weakly context-free. Concerning morphology, Culy (1985) employed a construction in Bambara to show that morphology is not weakly context-free; however, Manaster-Ramer (1988) pointed out that the Bambara case can be problematic because the wordhood of the construction is reliant on special tonal behaviors, and it is ambiguous whether the behaviors belong to the morphological domain. This raises doubts about whether the case can be considered a genuine morphological phenomenon. In this paper, we argue that Classical Ainu, a language we examine, also defies weak context-freeness at the morphological level. The construction we introduce is unambiguously morphological because this language’s valency-sensitive structure and valency-changing operations, such as noun incorporation, preclude its grammatical interpretation as syntactic.
2018
pdf
bib
Universal Dependencies for Ainu
Hajime Senuma
|
Akiko Aizawa
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
2017
pdf
bib
Seq2seq for Morphological Reinflection: When Deep Learning Fails
Hajime Senuma
|
Akiko Aizawa
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection
pdf
bib
Toward Universal Dependencies for Ainu
Hajime Senuma
|
Akiko Aizawa
Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017)
2016
pdf
bib
abs
Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization
Hajime Senuma
|
Akiko Aizawa
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers
The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there are m features in total but only n = o(√m) features are required to distinguish examples, with 𝛺(log m) training examples and reasonable settings, it is possible to obtain a good model in a succinct representation using n log2 m⁄n + o(m) bits, by using a pipeline of existing compression methods: L1-regularized logistic regression, feature hashing, Elias–Fano indices, and randomized quantization. An experiment shows that a noun phrase chunking task for which an existing library requires 27 megabytes can be compressed to less than 13 kilobytes without notable loss of accuracy.
2011
pdf
bib
K-means Clustering with Feature Hashing
Hajime Senuma
Proceedings of the ACL 2011 Student Session