Andi Wu


2010

pdf bib
Treebank of Chinese Bible Translations
Andi Wu
CIPS-SIGHAN Joint Conference on Chinese Language Processing

2006

pdf bib
A Hebrew Tree Bank Based on Cantillation Marks
Andi Wu | Kirk Lowery
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In the Masoretic text of the Hebrew Bible (HB), the cantillation marks function like a punctuation system that shows the division and subdivision of each verse, forming a tree structure which is similar to the prosodic tree in modern linguistics. However, in the Masoretic text, the structure is hidden in a complicated set of diacritic symbols and the rich information is accessible only to a few trained scholars. In order to make the structural information available to the general public and to automatic processing by the computer, we built a tree bank where the hierarchical structure of each HB verse is explicitly represented in XML format. We coded the punctuation system in a context-tree grammar which was then used by a CYK parser to automatically generate trees for the whole HB. The results show that (1) the CFG correctly encoded the annotation rules and (2) the annotation done by the Masoretes is highly consistent.

pdf bib
From Prosodic Trees to Syntactic Trees
Andi Wu | Kirk Lowery
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

2005

pdf bib
Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach
Jianfeng Gao | Mu Li | Andi Wu | Chang-Ning Huang
Computational Linguistics, Volume 31, Number 4, December 2005

2004

pdf bib
Adaptive Chinese Word Segmentation
Jianfeng Gao | Andi Wu | Mu Li | Chang-Ning Huang | Hongqiao Li | Xinsong Xia | Haowei Qin
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf bib
Customizable Segmentation of Morphologically Derived Words in Chinese
Andi Wu
International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 1, February 2003: Special Issue on Word Formation and Chinese Language Processing

pdf bib
Learning Verb-Noun Relations to Improve Parsing
Andi Wu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

pdf bib
Chinese Word Segmentation in MSR-NLP
Andi Wu
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing

2002

pdf bib
Dynamic Lexical Acquisition in Chinese Sentence Analysis
Andi Wu | Joseph Pentheroudakis | Zixin Jiang
COLING 2002: The 17th International Conference on Computational Linguistics: Project Notes

2001

pdf bib
Multilingual Sentence Generation
Takako Aikawa | Maite Melero | Lee Schwartz | Andi Wu
Proceedings of the ACL 2001 Eighth European Workshop on Natural Language Generation (EWNLG)

pdf bib
Generation for multilingual MT
Takako Aikawa | Maite Melero | Lee Schwartz | Andi Wu
Proceedings of Machine Translation Summit VIII

This paper presents an overview of the broad-coverage, application-independent natural language generation component of the NLP system being developed at Microsoft Research. It demonstrates how this component functions within a multilingual Machine Translation system (MSR-MT), using the languages that we are currently working on (English, Spanish, Japanese, and Chinese). Section 1 provides a system description of MSR-MT. Section 2 focuses on the generation component and its set of core rules. Section 3 describes an additional layer of generation rules with examples that address issues specific to MT. Section 4 presents evaluation results in the context of MSR-MT. Section 5 addresses generation issues outside of MT.

2000

pdf bib
Statistically-Enhanced New Word Identification in a Rule-Based Chinese System
Andi Wu | Zixin Jiang
Second Chinese Language Processing Workshop