Qiao-Ying He
2025
Toward Traditional Chinese ModernBERT: A Preliminary Study
Yi-En Chen
|
Qiao-Ying He
|
Kuan-Yu Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
This study employs several state-of-the-art techniques, including RoPE and Flash Attention, and leverages large-scale Chinese web corpora and encyclopedic data to pre-train an encoder model specifically designed for long text in Traditional Chinese. We evaluate the model on tasks such as reading comprehension and text classification, and the results show that its overall performance lags behind existing Chinese benchmarks. Through pseudo-perplexity analysis, we infer that the pre-training phase did not sufficiently capture the data distribution, potentially due to factors such as hyperparameters, convergence, and data quality. Although the results are suboptimal, this study still offers valuable experimental insights and directions for improving Chinese language model development.
Cross-user Collaborative and Sequential Modeling for Recommendation
Qiao-Ying He
|
Yi-En Chen
|
Kuan-Yu Chen
Proceedings of the 37th Conference on Computational Linguistics and Speech Processing (ROCLING 2025)
Multi-behavior recommendation leverages auxiliary behaviors to effectively alleviate the sparsity of target behaviors. Existing approaches can be broadly categorized into two paradigms: sequential models that capture individual temporal dynamics but often omit cross-user information, and graph-based models that mine collaborative patterns yet lack temporal dependency modeling. To address these limitations, this paper proposes an integrated approach that combines sequential and graph modeling: the former focuses on learning temporal dependencies within user behavior sequences, while the latter captures cross-user behavior paths. By fusing the predictions from both components, the method achieves more accurate recommendations. Experiments on two e-commerce datasets, Taobao and RetailRocket, show that the integrated model outperforms the strong baseline MB-STR by about 1% in both HR@10 and NDCG@10. These results indicate that incorporating cross-user collaborative information consistently improves performance, even on top of strong sequential models.