Hiromu Takahashi
2024
Quantifying Memorization and Detecting Training Data of Pre-trained Language Models using Japanese Newspaper
Shotaro Ishihara
|
Hiromu Takahashi
Proceedings of the 17th International Natural Language Generation Conference
Dominant pre-trained language models (PLMs) have demonstrated the potential risk of memorizing and outputting the training data. While this concern has been discussed mainly in English, it is also practically important to focus on domain-specific PLMs. In this study, we pre-trained domain-specific GPT-2 models using a limited corpus of Japanese newspaper articles and evaluated their behavior. Experiments replicated the empirical finding that memorization of PLMs is related to the duplication in the training data, model size, and prompt length, in Japanese the same as in previous English studies. Furthermore, we attempted membership inference attacks, demonstrating that the training data can be detected even in Japanese, which is the same trend as in English. The study warns that domain-specific PLMs, sometimes trained with valuable private data, can ”copy and paste” on a large scale.
2022
Semantic Shift Stability: Efficient Way to Detect Performance Degradation of Word Embeddings and Pre-trained Language Models
Shotaro Ishihara
|
Hiromu Takahashi
|
Hono Shirai
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Word embeddings and pre-trained language models have become essential technical elements in natural language processing. While the general practice is to use or fine-tune publicly available models, there are significant advantages in creating or pre-training unique models that match the domain. The performance of the models degrades as language changes or evolves continuously, but the high cost of model building inhibits regular re-training, especially for the language models. This study proposes an efficient way to detect time-series performance degradation of word embeddings and pre-trained language models by calculating the degree of semantic shift. Monitoring performance through the proposed method supports decision-making as to whether a model should be re-trained. The experiments demonstrated that the proposed method can identify time-series performance degradation in two datasets, Japanese and English. The source code is available at https://github.com/Nikkei/semantic-shift-stability.
Search