MTLS: Making Texts into Linguistic Symbols

Wenlong Fei, Xiaohua Wang, Min Hu, Qingyu Zhang, Hongbo Li


Abstract
In linguistics, all languages can be considered as symbolic systems, with each language relying on symbolic processes to associate specific symbols with meanings. In the same language, there is a fixed correspondence between linguistic symbol and meaning. In different languages, universal meanings follow varying rules of symbolization in one-to-one correspondence with symbols. Most work overlooks the properties of languages as symbol systems. In this paper, we shift the focus to the symbolic properties and introduce MTLS: a pre-training method to improve the multilingual capability of models by Making Texts into Linguistic Symbols. Initially, we replace the vocabulary in pre-trained language models by mapping relations between linguistic symbols and semantics. Subsequently, universal semantics within the symbolic system serve as bridges, linking symbols from different languages to the embedding space of the model, thereby enabling the model to process linguistic symbols. To evaluate the effectiveness of MTLS, we conducted experiments on multilingual tasks using BERT and RoBERTa, respectively, as the backbone. The results indicate that despite having just over 12,000 pieces of English data in pre-training, the improvement that MTLS brings to multilingual capabilities is remarkably significant.
Anthology ID:
2024.emnlp-main.206
Volume:
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Month:
November
Year:
2024
Address:
Miami, Florida, USA
Editors:
Yaser Al-Onaizan, Mohit Bansal, Yun-Nung Chen
Venue:
EMNLP
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3521–3535
Language:
URL:
https://aclanthology.org/2024.emnlp-main.206/
DOI:
10.18653/v1/2024.emnlp-main.206
Bibkey:
Cite (ACL):
Wenlong Fei, Xiaohua Wang, Min Hu, Qingyu Zhang, and Hongbo Li. 2024. MTLS: Making Texts into Linguistic Symbols. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 3521–3535, Miami, Florida, USA. Association for Computational Linguistics.
Cite (Informal):
MTLS: Making Texts into Linguistic Symbols (Fei et al., EMNLP 2024)
Copy Citation:
PDF:
https://aclanthology.org/2024.emnlp-main.206.pdf