Koichi Nagatsuka

2026

Is Micro Domain-Adaptive Pre-Training Effective for Real-World Operations? Multi-Step Evaluation Reveals Potential and Bottlenecks
Masaya Tsunokake | Yuta Koreeda | Terufumi Morishita | Koichi Nagatsuka | Hikaru Tomonari | Yasuhiro Sogawa
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 5: Industry Track)

When applying LLMs to real-world enterprise operations, LLMs need to handle proprietary knowledge in small domains of specific operations (micro domains).A previous study shows micro domain-adaptive pre-training (mDAPT) with fewer documents is effective, similarly to DAPT in larger domains.However, it evaluates mDAPT only on multiple-choice questions; thus, its effectiveness for generative tasks in real-world operations remains unknown.We aim to reveal the potential and bottlenecks of mDAPT for generative tasks.To this end, we disentangle the answering process into three subtasks and evaluate the performance of each subtask: (1) eliciting facts relevant to questions from an LLM’s own knowledge, (2) reasoning over the facts to obtain conclusions, and (3) composing long-form answers based on the conclusions.We verified mDAPT on proprietary IT product knowledge for real-world questions in IT technical support operations. As a result, mDAPT resolved the elicitation task that the base model struggled with but did not resolve other subtasks.This clarifies mDAPT’s effectiveness in the knowledge aspect and its bottlenecks in other aspects.Further analysis empirically shows that resolving the elicitation and reasoning tasks ensures sufficient performance (over 90%), emphasizing the need to enhance reasoning capability.

2021

pdf bib abs

Pre-training a BERT with Curriculum Learning by Increasing Block-Size of Input Text
Koichi Nagatsuka | Clifford Broni-Bediako | Masayasu Atsumi
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Recently, pre-trained language representation models such as BERT and RoBERTa have achieved significant results in a wide range of natural language processing (NLP) tasks, however, it requires extremely high computational cost. Curriculum Learning (CL) is one of the potential solutions to alleviate this problem. CL is a training strategy where training samples are given to models in a meaningful order instead of random sampling. In this work, we propose a new CL method which gradually increases the block-size of input text for training the self-attention mechanism of BERT and its variants using the maximum available batch-size. Experiments in low-resource settings show that our approach outperforms the baseline in terms of convergence speed and final performance on downstream tasks.

Co-authors

Hikaru Tomonari 1

Masaya Tsunokake 1

Venues

EACL1
RANLP1

Fix author