Masaharu Yoshioka


pdf bib
Coding Open-Ended Responses using Pseudo Response Generation by Large Language Models
Yuki Zenimoto | Ryo Hasegawa | Takehito Utsuro | Masaharu Yoshioka | Noriko Kando
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)

Survey research using open-ended responses is an important method thatcontributes to the discovery of unknown issues and new needs. However,survey research generally requires time and cost-consuming manual dataprocessing, indicating that it is difficult to analyze large dataset.To address this issue, we propose an LLM-based method to automate partsof the grounded theory approach (GTA), a representative approach of thequalitative data analysis. We generated and annotated pseudo open-endedresponses, and used them as the training data for the coding proceduresof GTA. Through evaluations, we showed that the models trained withpseudo open-ended responses are quite effective compared with thosetrained with manually annotated open-ended responses. We alsodemonstrate that the LLM-based approach is highly efficient andcost-saving compared to human-based approach.


pdf bib
Measuring Beginner Friendliness of Japanese Web Pages explaining Academic Concepts by Integrating Neural Image Feature and Text Features
Hayato Shiokawa | Kota Kawaguchi | Bingcai Han | Takehito Utsuro | Yasuhide Kawada | Masaharu Yoshioka | Noriko Kando
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

Search engine is an important tool of modern academic study, but the results are lack of measurement of beginner friendliness. In order to improve the efficiency of using search engine for academic study, it is necessary to invent a technique of measuring the beginner friendliness of a Web page explaining academic concepts and to build an automatic measurement system. This paper studies how to integrate heterogeneous features such as a neural image feature generated from the image of the Web page by a variant of CNN (convolutional neural network) as well as text features extracted from the body text of the HTML file of the Web page. Integration is performed through the framework of the SVM classifier learning. Evaluation results show that heterogeneous features perform better than each individual type of features.


pdf bib
Automatic Annotation of Parameters from Nanodevice Development Research Papers
Thaer M. Dieb | Masaharu Yoshioka | Shinjiroh Hara | Marcus C. Newton
Proceedings of the 4th International Workshop on Computational Terminology (Computerm)


pdf bib
Time Series Topic Modeling and Bursty Topic Detection of Correlated News and Twitter
Daichi Koike | Yusuke Takahashi | Takehito Utsuro | Masaharu Yoshioka | Noriko Kando
Proceedings of the Sixth International Joint Conference on Natural Language Processing


pdf bib
Cross-Lingual Topic Alignment in Time Series Japanese / Chinese News
Shuo Hu | Yusuke Takahashi | Liyi Zheng | Takehito Utsuro | Masaharu Yoshioka | Noriko Kando | Tomohiro Fukuhara | Hiroshi Nakagawa | Yoji Kiyota
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation