Tsuneo Kato

2025

Wenzhou Dialect Speech to Mandarin Text Conversion
Zhipeng Gao | Akihiro Tamura | Tsuneo Kato
Proceedings of the Eighth Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2025)

The Wenzhou dialect is a Chinese dialect that is significantly distinct from Mandarin, the official language of China. It is among the most complex Chinese dialects and is nearly incomprehensible to people from regions such as Northern China, thereby creating substantial communication barriers. Therefore, the conversion between the Wenzhou dialect and Mandarin is essential to facilitate communication between Wenzhou dialect speakers and those from other Chinese regions. However, as a low-resource language, the Wenzhou dialect lacks publicly available datasets, and such conversion technologies have not been extensively researched. Thus, in this study, we create a parallel dataset containing Wenzhou dialect speech and the corresponding Mandarin text and build benchmark models for Wenzhou dialect speech-to-Mandarin text conversion. In particular, we fine-tune two self-supervised learning-based pretrained models, that is, TeleSpeech-ASR1.0 and Wav2Vec2-XLS-R, with our training dataset and report their performance on our test dataset as baselines for future research.

2023

pdf bib abs

Multimodal Neural Machine Translation Using Synthetic Images Transformed by Latent Diffusion Model
Ryoya Yuasa | Akihiro Tamura | Tomoyuki Kajiwara | Takashi Ninomiya | Tsuneo Kato
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)

This study proposes a new multimodal neural machine translation (MNMT) model using synthetic images transformed by a latent diffusion model. MNMT translates a source language sentence based on its related image, but the image usually contains noisy information that are not relevant to the source language sentence. Our proposed method first generates a synthetic image corresponding to the content of the source language sentence by using a latent diffusion model and then performs translation based on the synthetic image. The experiments on the English-German translation tasks using the Multi30k dataset demonstrate the effectiveness of the proposed method.

2022

pdf bib abs

Auxiliary Learning for Named Entity Recognition with Multiple Auxiliary Biomedical Training Data
Taiki Watanabe | Tomoya Ichikawa | Akihiro Tamura | Tomoya Iwakura | Chunpeng Ma | Tsuneo Kato
Proceedings of the 21st Workshop on Biomedical Language Processing

Named entity recognition (NER) is one of the elemental technologies, which has been used for knowledge extraction from biomedical text. As one of the NER improvement approaches, multi-task learning that learns a model from multiple training data has been used. Among multi-task learning, an auxiliary learning method, which uses an auxiliary task for improving its target task, has shown higher NER performance than conventional multi-task learning for improving all the tasks simultaneously by using only one auxiliary task in the auxiliary learning. We propose Multiple Utilization of NER Corpora Helpful for Auxiliary BLESsing (MUNCH ABLES). MUNCHABLES utilizes multiple training datasets as auxiliary training data by the following methods; the first one is to finetune the NER model of the target task by sequentially performing auxiliary learning for each auxiliary training dataset, and the other is to use all training datasets in one auxiliary learning. We evaluate MUNCHABLES on eight biomedical-related domain NER tasks, where seven training datasets are used as auxiliary training data. The experiment results show that MUNCHABLES achieves higher accuracy than conventional multi-task learning methods on average while showing state-of-the-art accuracy.

pdf bib abs

This paper presents a new benchmark test dataset for multi-level complexity-controllable machine translation (MLCC-MT), which is MT controlling the complexity of the output at more than two levels. In previous research, MLCC-MT models have been evaluated on a test dataset automatically constructed from the Newsela corpus, which is a document-level comparable corpus with document-level complexity. The existing test dataset has the following three problems: (i) A source language sentence and its target language sentence are not necessarily an exact translation pair because they are automatically detected. (ii) A target language sentence and its simplified target language sentence are not necessarily exactly parallel because they are automatically aligned. (iii) A sentence-level complexity is not necessarily appropriate because it is transferred from an article-level complexity attached to the Newsela corpus. Therefore, we create a benchmark test dataset for Japanese-to-English MLCC-MT from the Newsela corpus by introducing an automatic filtering of data with inappropriate sentence-level complexity, manual check for parallel target language sentences with different complexity levels, and manual translation. Moreover, we implement two MLCC-NMT frameworks with a Transformer architecture and report their performance on our test dataset as baselines for future research. Our test dataset and codes are released.

2021

pdf bib abs

Contrastive Response Pairs for Automatic Evaluation of Non-task-oriented Neural Conversational Models
Koshiro Okano | Yu Suzuki | Masaya Kawamura | Tsuneo Kato | Akihiro Tamura | Jianming Wu
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Responses generated by neural conversational models (NCMs) for non-task-oriented systems are difficult to evaluate. We propose contrastive response pairs (CRPs) for automatically evaluating responses from non-task-oriented NCMs. We conducted an error analysis on responses generated by an encoder-decoder recurrent neural network (RNN) type NCM and created three types of CRPs corresponding to the three most frequent errors found in the analysis. Three NCMs of different response quality were objectively evaluated with the CRPs and compared to a subjective assessment. The correctness obtained by the three types of CRPs were consistent with the results of the subjective assessment.

2017

pdf bib abs

Utterance Intent Classification of a Spoken Dialogue System with Efficiently Untied Recursive Autoencoders
Tsuneo Kato | Atsushi Nagai | Naoki Noda | Ryosuke Sumitomo | Jianming Wu | Seiichi Yamamoto
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue

Recursive autoencoders (RAEs) for compositionality of a vector space model were applied to utterance intent classification of a smartphone-based Japanese-language spoken dialogue system. Though the RAEs express a nonlinear operation on the vectors of child nodes, the operation is considered to be different intrinsically depending on types of child nodes. To relax the difference, a data-driven untying of autoencoders (AEs) is proposed. The experimental result of the utterance intent classification showed an improved accuracy with the proposed method compared with the basic tied RAE and untied RAE based on a manual rule.

2016

pdf bib abs

Joining-in-type Humanoid Robot Assisted Language Learning System
AlBara Khalifa | Tsuneo Kato | Seiichi Yamamoto
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Dialogue robots are attractive to people, and in language learning systems, they motivate learners and let them practice conversational skills in more realistic environment. However, automatic speech recognition (ASR) of the second language (L2) learners is still a challenge, because their speech contains not just pronouncing, lexical, grammatical errors, but is sometimes totally disordered. Hence, we propose a novel robot assisted language learning (RALL) system using two robots, one as a teacher and the other as an advanced learner. The system is designed to simulate multiparty conversation, expecting implicit learning and enhancement of predictability of learners’ utterance through an alignment similar to “interactive alignment”, which is observed in human-human conversation. We collected a database with the prototypes, and measured how much the alignment phenomenon observed in the database with initial analysis.

Co-authors

Venues

WS1

Fix author