Hongbo Li


2024

pdf bib
MTLS: Making Texts into Linguistic Symbols
Wenlong Fei | Xiaohua Wang | Min Hu | Qingyu Zhang | Hongbo Li
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

In linguistics, all languages can be considered as symbolic systems, with each language relying on symbolic processes to associate specific symbols with meanings. In the same language, there is a fixed correspondence between linguistic symbol and meaning. In different languages, universal meanings follow varying rules of symbolization in one-to-one correspondence with symbols. Most work overlooks the properties of languages as symbol systems. In this paper, we shift the focus to the symbolic properties and introduce MTLS: a pre-training method to improve the multilingual capability of models by Making Texts into Linguistic Symbols. Initially, we replace the vocabulary in pre-trained language models by mapping relations between linguistic symbols and semantics. Subsequently, universal semantics within the symbolic system serve as bridges, linking symbols from different languages to the embedding space of the model, thereby enabling the model to process linguistic symbols. To evaluate the effectiveness of MTLS, we conducted experiments on multilingual tasks using BERT and RoBERTa, respectively, as the backbone. The results indicate that despite having just over 12,000 pieces of English data in pre-training, the improvement that MTLS brings to multilingual capabilities is remarkably significant.

2008

pdf bib
Feature Weighting Random Forest for Detection of Hidden Web Search Interfaces
Yunming Ye | Hongbo Li | Xiaobai Deng | Joshua Zhexue Huang
International Journal of Computational Linguistics & Chinese Language Processing, Volume 13, Number 4, December 2008