Shicheng Li


pdf bib
No Stock is an Island: Learning Internal and Relational Attributes of Stocks with Contrastive Learning
Shicheng Li | Wei Li | Zhiyuan Zhang | Ruihan Bao | Keiko Harimoto | Xu Sun
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Previous work has demonstrated the viability of applying deep learning techniques in the financial area. Recently, the task of stock embedding learning has been drawing attention from the research community, which aims to represent the characteristics of stocks with distributed vectors that can be used in various financial analysis scenarios. Existing approaches for learning stock embeddings either require expert knowledge, or mainly focus on the textual part of information corresponding to individual temporal movements. In this paper, we propose to model stock properties as the combination of internal attributes and relational attributes, which takes into consideration both the time-invariant properties of individual stocks and their movement patterns in relation to the market. To learn the two types of attributes from financial news and transaction data, we design several training objectives based on contrastive learning to extract and separate the long-term and temporary information in the data that are able to counter the inherent randomness of the stock market. Experiments and further analyses on portfolio optimization reveal the effectiveness of our method in extracting comprehensive stock information from various data sources.


pdf bib
Multi-Granularity Contrasting for Cross-Lingual Pre-Training
Shicheng Li | Pengcheng Yang | Fuli Luo | Jun Xie
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf bib
Rethinking Denoised Auto-Encoding in Language Pre-Training
Fuli Luo | Pengcheng Yang | Shicheng Li | Xuancheng Ren | Xu Sun | Songfang Huang | Fei Huang
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing. These models typically corrupt the given sequences with certain types of noise, such as masking, shuffling, or substitution, and then try to recover the original input. However, such pre-training approaches are prone to learning representations that are covariant with the noise, leading to the discrepancy between the pre-training and fine-tuning stage. To remedy this, we present ContrAstive Pre-Training (CAPT) to learn noise invariant sequence representations. The proposed CAPT encourages the consistency between representations of the original sequence and its corrupted version via unsupervised instance-wise training signals. In this way, it not only alleviates the pretrain-finetune discrepancy induced by the noise of pre-training, but also aids the pre-trained model in better capturing global semantics of the input via more effective sentence-level supervision. Different from most prior work that focuses on a particular modality, comprehensive empirical evidence on 11 natural language understanding and cross-modal tasks illustrates that CAPT is applicable for both language and vision-language tasks, and obtains surprisingly consistent improvement, including 0.6% absolute gain on GLUE benchmarks and 0.8% absolute increment on NLVR2.