Investigating Web Corpus Filtering Methods for Language Model Development in Japanese Rintaro Enomoto author Arseny Tolmachev author Takuro Niitsuma author Shuhei Kurita author Daisuke Kawahara author 2024-06 text Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop) Yang (Trista) Cao editor Isabel Papadimitriou editor Anaelia Ovalle editor Marcos Zampieri editor Francis Ferraro editor Swabha Swayamdipta editor Association for Computational Linguistics Mexico City, Mexico conference publication enomoto-etal-2024-investigating 10.18653/v1/2024.naacl-srw.18 https://aclanthology.org/2024.naacl-srw.18/ 2024-06 154 160