Shan Wang

Also published as:


pdf bib
中国语言学研究 70 年:核心期刊的词汇增长(70 Years of Linguistics Research in China: Vocabulary Growth of Core Journals)
Shan Wang (王珊) | Runzhe Zhan (詹润哲) | Shuangyun Yao (姚双云)
Proceedings of the 21st Chinese National Conference on Computational Linguistics

“建国以来我国语言学经过 70 年的发展取得了瞩目的成就,已有研究主要以回顾主要历史事件的方式介绍这一进程,但尚缺少使用量化手段分析其历时发展的研究。本文以词汇增长为切入点探究这一主题,首次创建大规模语言学中文核心期刊摘要的历时语料库,并使用三大词汇增长模型预测语料库中词汇的变化。本文选择拟合效果最好的 Heaps 模型分阶段深入分析语言学词汇的变化,显示出国家政策的指导作用和特定时代的语言生活特征。此外,与时序无关的验证程序支撑了本文研究方法的有效性。 关键词:中国语言学;词汇增长;核心期刊;摘要;语料库;历时发展”


pdf bib
基于依存语法的偷抢类动词研究(Research of Verbs of Stealing and Robbing Based on Dependency Grammar)
Shan Wang (王珊) | Xiaojun Liu (刘晓骏)
Proceedings of the 20th Chinese National Conference on Computational Linguistics


pdf bib
近十年来澳门的词汇增长(Macau’s Vocabulary Growth in the Recent Ten Year)
Shan Wang (王珊) | Zhao Chen (陈钊) | Haodi Zhang (张昊迪)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

词汇增长模型可以通过拟合词种(types)与词例(tokens)之间的数量关系,反映某一领域词汇的历时演化。澳门作为多语言多文化融合之地,词汇的使用情况能够反映社会的关注焦点,但目前尚无对澳门历时词汇演变的研究。本文首次构建澳门汉语历时语料库,利用三大词汇增长模型拟合语料库的词汇变化,并选取效果最好的 Heaps 模型进一步分析词汇演变与报刊内容的关系,结果反映出澳门词汇的变化趋势与热点新闻、澳门施政方针和民生密切相关。本研究还采用去除文本时序信息后的乱序文本,验证了方法的有效性。本文是首项基于大规模历时语料库考察澳门词汇演变的研究,对深入了解澳门语言生活的发展具有重要意义。

pdf bib
替换类动词的句法语义分析(Syntactic and Semantic Analysis of verbs of Exchange)
Shan Wang (王珊) | Le Wu (吴乐)
Proceedings of the 20th Chinese National Conference on Computational Linguistics


pdf bib
回避类动词的句法语义(The Syntax and Semantics of Verbs of Avoiding)
Shan Wang (王珊) | Xiaojun Liu (刘晓骏)
Proceedings of the 20th Chinese National Conference on Computational Linguistics


pdf bib
欺骗类动词的句法语义研究(On the Syntax and Semantics of Verbs of Cheating)
Shan Wang (王珊) | Jie Zhou (周洁)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



pdf bib
Identifying Idioms in Chinese Translations
Wan Yu Ho | Christine Kng | Shan Wang | Francis Bond
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Optimally, a translated text should preserve information while maintaining the writing style of the original. When this is not possible, as is often the case with figurative speech, a common practice is to simplify and make explicit the implications. However, in our investigations of translations from English to another language, English-to-Chinese texts were often found to include idiomatic expressions (usually in the form of Chengyu 成è ̄) where there were originally no idiomatic, metaphorical, or even figurative expressions. We have created an initial small lexicon of Chengyu, with which we can use to find all occurrences of Chengyu in a given corpus, and will continue to expand the database. By examining the rates and patterns of occurrence across four genres in the NTU Multilingual Corpus, a resource may be created to aid machine translation or, going further, predict Chinese translational trends in any given genre.

pdf bib
Building The Sense-Tagged Multilingual Parallel Corpus
Shan Wang | Francis Bond
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Sense-annotated parallel corpora play a crucial role in natural language processing. This paper introduces our progress in creating such a corpus for Asian languages using English as a pivot, which is the first such corpus for these languages. Two sets of tools have been developed for sequential and targeted tagging, which are also easy to set up for any new language in addition to those we are annotating. This paper also briefly presents the general guidelines for doing this project. The current results of monolingual sense-tagging and multilingual linking are illustrated, which indicate the differences among genres and language pairs. All the tools, guidelines and the manually annotated corpus will be freely available at

pdf bib
Issues in building English-Chinese parallel corpora with WordNets.
Francis Bond | Shan Wang
Proceedings of the Seventh Global Wordnet Conference


pdf bib
Developing Parallel Sense-tagged Corpora with Wordnets
Francis Bond | Shan Wang | Eshley Huini Gao | Hazel Shuwen Mok | Jeanette Yiwen Tan
Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse

pdf bib
Building the Chinese Open Wordnet (COW): Starting from Core Synsets
Shan Wang | Francis Bond
Proceedings of the 11th Workshop on Asian Language Resources


pdf bib
Compositionality of NN Compounds: A Case Study on [N1+Artifactual-Type Event Nouns]
Shan Wang | Chu-Ren Huang | Hongzhi Xu
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

pdf bib
Type Construction of Event Nouns in Mandarin Chinese
Shan Wang | Chu-Ren Huang
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation


pdf bib
Compound Event Nouns of the ‘Modifier-head’ Type in Mandarin Chinese
Shan Wang | Chu-Ren Huang
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation


pdf bib
Adjectival Modification to Nouns in Mandarin Chinese: Case Studies on “cháng+noun” and “adjective+tú shū gu n”
Shan Wang | Chu-Ren Huang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

pdf bib
Compositional Operations of Mandarin Chinese Perception Verb “kàn”: A Generative Lexicon Approach
Shan Wang | Chu-Ren Huang
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation


pdf bib
Classifying Temporal Relations Between Events
Nathanael Chambers | Shan Wang | Dan Jurafsky
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions