Takashi Harada


2024

pdf bib
Can Impressions of Music be Extracted from Thumbnail Images?
Takashi Harada | Takehiro Motomitsu | Katsuhiko Hayashi | Yusuke Sakai | Hidetaka Kamigaito
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)

In recent years, there has been a notable increase in research on machine learning models for music retrieval and generation systems that are capable of taking natural language sentences as inputs. However, there is a scarcity of large-scale publicly available datasets, consisting of music data and their corresponding natural language descriptions known as music captions. In particular, non-musical information such as suitable situations for listening to a track and the emotions elicited upon listening is crucial for describing music. This type of information is underrepresented in existing music caption datasets due to the challenges associated with extracting it directly from music data. To address this issue, we propose a method for generating music caption data that incorporates non-musical aspects inferred from music thumbnail images, and validated the effectiveness of our approach through human evaluations.

2008

pdf bib
Creation of Learner Corpus and Its Application to Speech Recognition
Hiroki Yamazaki | Keisuke Kitamura | Takashi Harada | Seiichi Yamamoto
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Some big languages like English are spoken by a lot of people whose mother tongues are different from. Their second languages often have not only distinct accent but also different lexical and syntactic characteristics. Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics in the training and recognition tasks differ. Language model of a speech recognition system is usually trained with transcribed speech data or text data collected in English native countries, therefore, speech recognition performance is expected to be degraded by mismatch of lexical and syntactic characteristics between native speakers and second language speakers as well as the distinction between their accents. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate for mismatch of the lexical, syntactic, or semantic characteristics. This paper describes whether the language model adaptation is effective for compensating for the mismatch between the lexical, syntactic, or semantic characteristics of native speakers and second language speakers.