Soichiro Sugihara


2024

pdf bib
Semi-automatic Construction of a Word Complexity Lexicon for Japanese Medical Terminology
Soichiro Sugihara | Tomoyuki Kajiwara | Takashi Ninomiya | Shoko Wakamiya | Eiji Aramaki
Proceedings of the 6th Clinical Natural Language Processing Workshop

We construct a word complexity lexicon for medical terms in Japanese.To facilitate communication between medical practitioners and patients, medical text simplification is being studied.Medical text simplification is a natural language processing task that paraphrases complex technical terms into expressions that patients can understand.However, in contrast to English, where this task is being actively studied, there are insufficient language resources in Japanese.As a first step in advancing research on medical text simplification in Japanese, we annotate the 370,000 words from a large-scale medical terminology lexicon with a five-point scale of complexity for patients.

pdf bib
Utilizing Longer Context than Speech Bubbles in Automated Manga Translation
Hiroto Kaino | Soichiro Sugihara | Tomoyuki Kajiwara | Takashi Ninomiya | Joshua B. Tanner | Shonosuke Ishiwatari
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

This paper focuses on improving the performance of machine translation for manga (Japanese-style comics). In manga machine translation, text consists of a sequence of speech bubbles and each speech bubble is translated individually. However, each speech bubble itself does not contain sufficient information for translation. Therefore, previous work has proposed methods to use contextual information, such as the previous speech bubble, speech bubbles within the same scene, and corresponding scene images. In this research, we propose two new approaches to capture broader contextual information. Our first approach involves scene-based translation that considers the previous scene. The second approach considers broader context information, including details about the work, author, and manga genre. Through our experiments, we confirm that each of our methods improves translation quality, with the combination of both methods achieving the highest quality. Additionally, detailed analysis reveals the effect of zero-anaphora resolution in translation, such as supplying missing subjects not mentioned within a scene, highlighting the usefulness of longer contextual information in manga machine translation.