Emotional aspects play a vital role in making human communication a rich and dynamic experience. As we introduce more automated system in our daily lives, it becomes increasingly important to incorporate emotion to provide as natural an interaction as possible. To achieve said incorporation, rich sets of labeled emotional data is prerequisite. However, in Japanese, existing emotion database is still limited to unimodal and bimodal corpora. Since emotion is not only expressed through speech, but also visually at the same time, it is essential to include multiple modalities in an observation. In this paper, we present the first audio-visual emotion corpora in Japanese, collected from 14 native speakers. The corpus contains 100 minutes of annotated and transcribed material. We performed preliminary emotion recognition experiments on the corpus and achieved an accuracy of 61.42% for five classes of emotion.