Multi-Source Text Classification for Multilingual Sentence Encoder with Machine Translation

Reon Kajikawa; Keiichiro Yamada; Tomoyuki Kajiwara; Takashi Ninomiya

doi:10.18653/v1/2024.naacl-srw.24

Multi-Source Text Classification for Multilingual Sentence Encoder with Machine Translation

Reon Kajikawa, Keiichiro Yamada, Tomoyuki Kajiwara, Takashi Ninomiya

Abstract

To reduce the cost of training models for each language for developers of natural language processing applications, pre-trained multilingual sentence encoders are promising.However, since training corpora for such multilingual sentence encoders contain only a small amount of text in languages other than English, they suffer from performance degradation for non-English languages.To improve the performance of pre-trained multilingual sentence encoders for non-English languages, we propose a method of machine translating a source sentence into English and then inputting it together with the source sentence in a multi-source manner.Experimental results on sentiment analysis and topic classification tasks in Japanese revealed the effectiveness of the proposed method.

Anthology ID:: 2024.naacl-srw.24
Volume:: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Month:: June
Year:: 2024
Address:: Mexico City, Mexico
Editors:: Yang (Trista) Cao, Isabel Papadimitriou, Anaelia Ovalle, Marcos Zampieri, Francis Ferraro, Swabha Swayamdipta
Venue:: NAACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 226–232
Language:
URL:: https://aclanthology.org/2024.naacl-srw.24
DOI:: 10.18653/v1/2024.naacl-srw.24
Bibkey:
Cite (ACL):: Reon Kajikawa, Keiichiro Yamada, Tomoyuki Kajiwara, and Takashi Ninomiya. 2024. Multi-Source Text Classification for Multilingual Sentence Encoder with Machine Translation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop), pages 226–232, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):: Multi-Source Text Classification for Multilingual Sentence Encoder with Machine Translation (Kajikawa et al., NAACL 2024)
Copy Citation:
PDF:: https://aclanthology.org/2024.naacl-srw.24.pdf

PDF Cite Search