Yuchen Wang

Also published as: 雨晨


pdf bib
TeamShakespeare at SemEval-2023 Task 6: Understand Legal Documents with Contextualized Large Language Models
Xin Jin | Yuchen Wang
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The growth of pending legal cases in populouscountries, such as India, has become a major is-sue. Developing effective techniques to processand understand legal documents is extremelyuseful in resolving this problem. In this pa-per, we present our systems for SemEval-2023Task 6: understanding legal texts (Modi et al., 2023). Specifically, we first develop the Legal-BERT-HSLN model that considers the com-prehensive context information in both intra-and inter-sentence levels to predict rhetoricalroles (subtask A) and then train a Legal-LUKEmodel, which is legal-contextualized and entity-aware, to recognize legal entities (subtask B).Our evaluations demonstrate that our designedmodels are more accurate than baselines, e.g.,with an up to 15.0% better F1 score in subtaskB. We achieved notable performance in the taskleaderboard, e.g., 0.834 micro F1 score, andranked No.5 out of 27 teams in subtask A.


pdf bib
Whodunit? Learning to Contrast for Authorship Attribution
Bo Ai | Yuchen Wang | Yugin Tan | Samson Tan
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Authorship attribution is the task of identifying the author of a given text. The key is finding representations that can differentiate between authors. Existing approaches typically use manually designed features that capture a dataset’s content and style, but these approaches are dataset-dependent and yield inconsistent performance across corpora. In this work, we propose to learn author-specific representations by fine-tuning pre-trained generic language representations with a contrastive objective (Contra-X). We show that Contra-X learns representations that form highly separable clusters for different authors. It advances the state-of-the-art on multiple human and machine authorship attribution benchmarks, enabling improvements of up to 6.8% over cross-entropy fine-tuning. However, we find that Contra-X improves overall accuracy at the cost of sacrificing performance for some authors. Resolving this tension will be an important direction for future work. To the best of our knowledge, we are the first to integrate contrastive learning with pre-trained language model fine-tuning for authorship attribution.


pdf bib
基于强负采样的词嵌入优化算法(Word Embedding Optimization Based on Hard Negative Sampling)
Yuchen Wang (王雨晨) | Miaozhe Lin (林淼哲) | Jiefan Zhan (詹杰凡)
Proceedings of the 19th Chinese National Conference on Computational Linguistics



pdf bib
Analyzing the Quality of Counseling Conversations: the Tell-Tale Signs of High-quality Counseling
Verónica Pérez-Rosas | Xuetong Sun | Christy Li | Yuchen Wang | Kenneth Resnicow | Rada Mihalcea
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)