Dekai Wu - ACL Anthology

Dekai Wu

2025

Defense Against Prompt Injection Attack by Leveraging Attack Techniques
Yulin Chen | Haoran Li | Zihao Zheng | Dekai Wu | Yangqiu Song | Bryan Hooi
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the advancement of technology, large language models (LLMs) have achieved remarkable performance across various natural language processing (NLP) tasks, powering LLM-integrated applications like Microsoft Copilot. However, as LLMs continue to evolve, new vulnerabilities, especially prompt injection attacks arise. These attacks trick LLMs into deviating from the original input instructions and executing the attacker’s instructions injected in data content, such as retrieved results. Recent attack methods leverage LLMs’ instruction-following abilities and their inabilities to distinguish instructions injected in the data content, and achieve a high attack success rate (ASR). When comparing the attack and defense methods, we interestingly find that they share similar design goals, of inducing the model to ignore unwanted instructions and instead to execute wanted instructions. Therefore, we raise an intuitive question: *Could these attack techniques be utilized for defensive purposes?* In this paper, we invert the intention of prompt injection methods to develop novel defense methods based on previous training-free attack methods, by repeating the attack process but with the original input instruction rather than the injected instruction. Our comprehensive experiments demonstrate that our defense techniques outperform existing defense approaches, achieving state-of-the-art results.

2020

Proceedings of the 17th International Conference on Spoken Language Translation
Marcello Federico | Alex Waibel | Kevin Knight | Satoshi Nakamura | Hermann Ney | Jan Niehues | Sebastian Stüker | Dekai Wu | Joseph Mariani | Francois Yvon
Proceedings of the 17th International Conference on Spoken Language Translation

2019

Efficient Bilingual Generalization from Neural Transduction Grammar Induction
Yuchen Yan | Dekai Wu | Serkan Kumyol
Proceedings of the 16th International Conference on Spoken Language Translation

We introduce (1) a novel neural network structure for bilingual modeling of sentence pairs that allows efficient capturing of bilingual relationship via biconstituent composition, (2) the concept of neural network biparsing, which applies to not only machine translation (MT) but also to a variety of other bilingual research areas, and (3) the concept of a biparsing-backpropagation training loop, which we hypothesize that can efficiently learn complex biparse tree patterns. Our work distinguishes from sequential attention-based models, which are more traditionally found in neural machine translation (NMT) in three aspects. First, our model enforces compositional constraints. Second, our model has a smaller search space in terms of discovering bilingual relationships from bilingual sentence pairs. Third, our model produces explicit biparse trees, which enable transparent error analysis during evaluation and external tree constraints during training.

2018

SRL for low resource languages isn’t needed for semantic SMT
Meriem Beloucif | Dekai Wu
Proceedings of the 21st Annual Conference of the European Association for Machine Translation

Previous attempts at injecting semantic frame biases into SMT training for low resource languages failed because either (a) no semantic parser is available for the low resource input language; or (b) the output English language semantic parses excise relevant parts of the alignment space too aggressively. We present the first semantic SMT model to succeed in significantly improving translation quality across many low resource input languages for which no automatic SRL is available —consistently and across all common MT metrics. The results we report are the best by far to date for this type of approach; our analyses suggest that in general, easier approaches toward including semantics in training SMT models may be more feasible than generally assumed even for low resource languages where semantic parsers remain scarce. While recent proposals to use the crosslingual evaluation metric XMEANT during inversion transduction grammar (ITG) induction are inapplicable to low resource languages that lack semantic parsers, we break the bottleneck via a vastly improved method of biasing ITG induction toward learning more semantically correct alignments using the monolingual semantic evaluation metric MEANT. Unlike XMEANT, MEANT requires only a readily-available English (output language) semantic parser. The advances we report here exploit the novel realization that MEANT represents an excellent way to semantically bias expectationmaximization induction even for low resource languages. We test our systems on challenging languages including Amharic, Uyghur, Tigrinya and Oromo. Results show that our model influences the learning towards more semantically correct alignments, leading to better translation quality than both the standard ITG or GIZA++ based SMT training models on different datasets.

2016

Driving inversion transduction grammar induction with semantic evaluation
Meriem Beloucif | Dekai Wu
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics

Learning Translations for Tagged Words: Extending the Translation Lexicon of an ITG for Low Resource Languages
Markus Saers | Dekai Wu
Proceedings of the Workshop on Multilingual and Cross-lingual Methods in NLP

Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)
Dekai Wu | Pushpak Bhattacharyya
Proceedings of the 6th Workshop on South and Southeast Asian Natural Language Processing (WSSANLP2016)

Improving word alignment for low resource languages using English monolingual SRL
Meriem Beloucif | Markus Saers | Dekai Wu
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focuses on learning bilingual correlations that help translating low resource languages, by using the output language semantic structure to further narrow down ITG constraints. This approach is motivated by previous research which has shown that injecting a semantic frame based objective function while training SMT models improves the translation quality. We show that including a monolingual semantic objective function during the learning of the translation model leads towards a semantically driven alignment which is more efficient than simply tuning loglinear mixture weights against a semantic frame based evaluation metric in the final stage of statistical machine translation training. We test our approach with three different language pairs and demonstrate that our model biases the learning towards more semantically correct alignments. Both GIZA++ and ITG based techniques fail to capture meaningful bilingual constituents, which is required when trying to learn translation models for low resource languages. In contrast, our proposed model not only improve translation by injecting a monolingual objective function to learn bilingual correlations during early training of the translation model, but also helps to learn more meaningful correlations with a relatively small data set, leading to a better alignment compared to either conventional ITG or traditional GIZA++ based approaches.

2015

Improving semantic SMT via soft semantic role label constraints on ITG alignmens
Meriem Beloucif | Markus Saers | Dekai Wu
Proceedings of Machine Translation Summit XV: Papers

Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marine Carpuat | Eneko Agirre | Nora Aranberri
Proceedings of the Ninth Workshop on Syntax, Semantics and Structure in Statistical Translation

Improving evaluation and optimization of MT systems against MEANT
Chi-kiu Lo | Philipp Dowling | Dekai Wu
Proceedings of the Tenth Workshop on Statistical Machine Translation

2014

Improving MEANT based semantically tuned SMT
Meriem Beloucif | Chi-kiu Lo | Dekai Wu
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

We discuss various improvements to our MEANT tuned system, previously presented at IWSLT 2013. In our 2014 system, we incorporate this year’s improved version of MEANT, improved Chinese word segmentation, Chinese named entity recognition and dedicated proper name translation, and number expression handling. This results in a significant performance jump compared to last year’s system. We also ran preliminary experiments on tuning to IMEANT, our new ITG based variant of MEANT. The performance of tuning to IMEANT is comparable to tuning on MEANT (differences are statistically insignificant). We are presently investigating if tuning on IMEANT can produce even better results, since IMEANT was actually shown to correlate with human adequacy judgment more closely than MEANT. Finally, we ran experiments applying our new architectural improvements to a contrastive system tuned to BLEU. We observed a slightly higher jump in comparison to last year, possibly due to mismatches of MEANT’s similarity models to our new entity handling.

Evaluating Improvised Hip Hop Lyrics - Challenges and Observations
Karteek Addanki | Dekai Wu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We investigate novel challenges involved in comparing model performance on the task of improvising responses to hip hop lyrics and discuss observations regarding inter-evaluator agreement on judging improvisation quality. We believe the analysis serves as a first step toward designing robust evaluation strategies for improvisation tasks, a relatively neglected area to date. Unlike most natural language processing tasks, improvisation tasks suffer from a high degree of subjectivity, making it difficult to design discriminative evaluation strategies to drive model development. We propose a simple strategy with fluency and rhyming as the criteria for evaluating the quality of generated responses, which we apply to both our inversion transduction grammar based FREESTYLE hip hop challenge-response improvisation system, as well as various contrastive systems. We report inter-evaluator agreement for both English and French hip hop lyrics, and analyze correlation with challenge length. We also compare the extent of agreement in evaluating fluency with that of rhyming, and quantify the difference in agreement with and without precise definitions of evaluation criteria.

On the reliability and inter-annotator agreement of human semantic MT evaluation via HMEANT
Chi-kiu Lo | Dekai Wu
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We present analyses showing that HMEANT is a reliable, accurate and fine-grained semantic frame based human MT evaluation metric with high inter-annotator agreement (IAA) and correlation with human adequacy judgments, despite only requiring a minimal training of about 15 minutes for lay annotators. Previous work shows that the IAA on the semantic role labeling (SRL) subtask within HMEANT is over 70%. In this paper we focus on (1) the IAA on the semantic role alignment task and (2) the overall IAA of HMEANT. Our results show that the IAA on the alignment task of HMEANT is over 90% when humans align SRL output from the same SRL annotator, which shows that the instructions on the alignment task are sufficiently precise, although the overall IAA where humans align SRL output from different SRL annotators falls to only 61% due to the pipeline effect on the disagreement in the two annotation task. We show that instead of manually aligning the semantic roles using an automatic algorithm not only helps maintaining the overall IAA of HMEANT at 70%, but also provides a finer-grained assessment on the phrasal similarity of the semantic role fillers. This suggests that HMEANT equipped with automatic alignment is reliable and accurate for humans to evaluate MT adequacy while achieving higher correlation with human adequacy judgments than HTER.

XMEANT: Better semantic MT evaluation without reference translations
Chi-kiu Lo | Meriem Beloucif | Markus Saers | Dekai Wu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marine Carpuat | Xavier Carreras | Eva Maria Vecchi
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

Better Semantic Frame Based MT Evaluation via Inversion Transduction Grammars
Dekai Wu | Chi-kiu Lo | Meriem Beloucif | Markus Saers
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

Ternary Segmentation for Improving Search in Top-down Induction of Segmental ITGs
Markus Saers | Dekai Wu
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

Transduction Recursive Auto-Associative Memory: Learning Bilingual Compositional Distributed Vector Representations of Inversion Transduction Grammars
Karteek Addanki | Dekai Wu
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

Lexical Access Preference and Constraint Strategies for Improving Multiword Expression Association within Semantic MT Evaluation
Dekai Wu | Chi-kiu Lo | Markus Saers
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

2013

What SMT Learns
Dekai Wu
Proceedings of the Workshop on Twenty Years of Bitext

Human semantic MT evaluation with HMEANT for IWSLT 2013
Chi-kiu Lo | Dekai Wu
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

We present the results of large-scale human semantic MT evaluation with HMEANT on the IWSLT 2013 German-English MT and SLT tracks and show that HMEANT evaluates the performance of the MT systems differently compared to BLEU and TER. Together with the references, all the translations are annotated by annotators who are native English speakers in both semantic role labeling stage and role filler alignment stage of HMEANT. We obtain high inter-annotator agreement and low annotation time costs which indicate that it is feasible to run a large-scale human semantic MT evaluation campaign using HMEANT. Our results also show that HMEANT is a robust and reliable semantic MT evaluation metric for running large-scale evaluation campaigns as it is inexpensive and simple while maintaining the semantic representational transparency to provide a perspective which is different from BLEU and TER in order to understand the performance of the state-of-the-art MT systems.

Improving machine translation into Chinese by tuning against Chinese MEANT
Chi-kiu Lo | Meriem Beloucif | Dekai Wu
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

We present the first ever results showing that Chinese MT output is significantly improved by tuning a MT system against a semantic frame based objective function, MEANT, rather than an n-gram based objective function, BLEU, as measured across commonly used metrics and different test sets. Recent work showed that by preserving the meaning of the translations as captured by semantic frames in the training process, MT systems for translating into English on both formal and informal genres are constrained to produce more adequate translations by making more accurate choices on lexical output and reordering rules. In this paper we describe our experiments in IWSLT 2013 TED talk MT tasks on tuning MT systems against MEANT for translating into Chinese and English respectively. We show that the Chinese translation output benefits more from tuning a MT system against MEANT than the English translation output due to the ambiguous nature of word boundaries in Chinese. Our encouraging results show that using MEANT is a promising alternative to BLEU in both evaluating and tuning MT systems to drive the progress of MT research across different languages.

Unsupervised learning of bilingual categories in inversion transduction grammar induction
Markus Saers | Dekai Wu
Proceedings of the 10th International Workshop on Spoken Language Translation: Papers

We present the first known experiments incorporating unsupervised bilingual nonterminal category learning within end-to-end fully unsupervised transduction grammar induction using matched training and testing models. Despite steady recent progress, such induction experiments until now have not allowed for learning differentiated nonterminal categories. We divide the learning into two stages: (1) a bootstrap stage that generates a large set of categorized short transduction rule hypotheses, and (2) a minimum conditional description length stage that simultaneously prunes away less useful short rule hypotheses, while also iteratively segmenting full sentence pairs into useful longer categorized transduction rules. We show that the second stage works better when the rule hypotheses have categories than when they do not, and that the proposed conditional description length approach combines the rules hypothesized by the two stages better than a mixture model does. We also show that the compact model learned during the second stage can be further improved by combining the result of different iterations in a mixture model. In total, we see a jump in BLEU score, from 17.53 for a standalone minimum description length baseline with no category learning, to 20.93 when incorporating category induction on a Chinese–English translation task.

Can Informal Genres be better Translated by Tuning on Automatic Semantic Metrics?
Chi-Kiu Lo | Dekai Wu
Proceedings of Machine Translation Summit XIV: Papers

Modeling Hip Hop Challenge-Response Lyrics as Machine Translation
Karteek Addanki | Markus Saers | Dekai Wu
Proceedings of Machine Translation Summit XIV: Papers

Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation
Dekai Wu | Karteek Addanki | Markus Saers | Meriem Beloucif
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

Bayesian Induction of Bracketing Inversion Transduction Grammars
Markus Saers | Dekai Wu
Proceedings of the Sixth International Joint Conference on Natural Language Processing

Improving machine translation by training against an automatic semantic frame based evaluation metric
Chi-kiu Lo | Karteek Addanki | Markus Saers | Dekai Wu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Segmenting vs. Chunking Rules: Unsupervised ITG Induction via Minimum Conditional Description Length
Markus Saers | Karteek Addanki | Dekai Wu
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation
Marine Carpuat | Lucia Specia | Dekai Wu
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

Combining Top-down and Bottom-up Search for Unsupervised Induction of Transduction Grammars
Markus Saers | Karteek Addanki | Dekai Wu
Proceedings of the Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation

MEANT at WMT 2013: A Tunable, Accurate yet Inexpensive Semantic Frame Based MT Evaluation Metric
Chi-kiu Lo | Dekai Wu
Proceedings of the Eighth Workshop on Statistical Machine Translation

Unsupervised Transduction Grammar Induction via Minimum Description Length
Markus Saers | Karteek Addanki | Dekai Wu
Proceedings of the Second Workshop on Hybrid Approaches to Translation

Unsupervised Learning of Bilingual Categories in Inversion Transduction Grammar Induction
Markus Saers | Dekai Wu
Proceedings of the 13th International Conference on Parsing Technologies (IWPT 2013)

2012

LTG vs. ITG Coverage of Cross-Lingual Verb Frame Alternations
Karteek Addanki | Chi-kiu Lo | Markus Saers | Dekai Wu
Proceedings of the 16th Annual Conference of the European Association for Machine Translation

From Finite-State to Inversion Transductions: Toward Unsupervised Bilingual Grammar Induction
Markus Saers | Karteek Addanki | Dekai Wu
Proceedings of COLING 2012

Fully Automatic Semantic MT Evaluation
Chi-kiu Lo | Anand Karthik Tumuluru | Dekai Wu
Proceedings of the Seventh Workshop on Statistical Machine Translation

Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation
Marine Carpuat | Lucia Specia | Dekai Wu
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Towards a Predicate-Argument Evaluation for MT
Ondřej Bojar | Dekai Wu
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Unsupervised vs. supervised weight estimation for semantic MT evaluation metrics
Chi-kiu Lo | Dekai Wu
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

Accuracy and robustness in measuring the lexical similarity of semantic role fillers for automatic semantic MT evaluation
Anand Karthik Tumuluru | Chi-kiu Lo | Dekai Wu
Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation

2011

Principled Induction of Phrasal Bilexica
Markus Saers | Dekai Wu
Proceedings of the 15th Annual Conference of the European Association for Machine Translation

On the Expressivity of Linear Transductions
Markus Saers | Dekai Wu | Chris Quirk
Proceedings of Machine Translation Summit XIII: Papers

Syntactic SMT and Semantic SMT
Dekai Wu
Proceedings of Machine Translation Summit XIII: Tutorial Abstracts

Over the past twenty years, we have attacked the historical methodological barriers between statistical machine translation and traditional models of syntax, semantics, and structure. In this tutorial, we will survey some of the central issues and techniques from each of these aspects, with an emphasis on `deeply theoretically integrated' models, rather than hybrid approaches such as superficial statistical aggregation or system combination of outputs produced by traditional symbolic components. On syntactic SMT, we will explore the trade-offs for SMT between learnability and representational expressiveness. After establishing a foundation in the theory and practice of stochastic transduction grammars, we will examine very recent new approaches to automatic unsupervised induction of various classes of transduction grammars. We will show why stochastic linear transduction grammars (LTGs and LITGs) and their preterminalized variants (PLITGs) are proving to be particularly intriguing models for the bootstrapping of inducing full-fledged stochastic inversion transduction grammars (ITGs). On semantic SMT, we will explore the trade-offs for SMT involved in applying various lexical semantics models. We will first examine word sense disambiguation, and discuss why traditional WSD models that are not deeply integrated within the SMT model tend, surprisingly, to fail. In contrast, we will show how a deeply embedded phrase sense disambiguation (PSD) approach succeeds where traditional WSD does not. We will then turn to semantic role labeling, and discuss the challenges of early approaches of applying SRL models to SMT. Finally, on semantic MT evaluation, we will explore some very new human and semi-automatic metrics based on semantic frame agreement. We show that by keeping the metrics deeply grounded within the theoretical framework of semantic frames, the new HMEANT and MEANT metrics can significantly outperform even the state-of-the-art expensive HTER and TER metrics, while at the same time maintaining the desirable characteristics of simplicity, inexpensiveness, and representational transparency.

Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web
Simon Shi | Pascale Fung | Emmanuel Prochasson | Chi-kiu Lo | Dekai Wu
Proceedings of 5th International Joint Conference on Natural Language Processing

MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles
Chi-kiu Lo | Dekai Wu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Linear Transduction Grammars and Zipper Finite-State Transducers
Markus Saers | Dekai Wu
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation
Dekai Wu | Marianna Apidianaki | Marine Carpuat | Lucia Specia
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

Structured vs. Flat Semantic Role Representations for Machine Translation Evaluation
Chi-kiu Lo | Dekai Wu
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

Reestimation of Reified Rules in Semiring Parsing and Biparsing
Markus Saers | Dekai Wu
Proceedings of Fifth Workshop on Syntax, Semantics and Structure in Statistical Translation

2010

Evaluating Machine Translation Utility via Semantic Role Labels
Chi-kiu Lo | Dekai Wu
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present the methodology that underlies mew metrics for semantic machine translation evaluation we are developing. Unlike widely-used lexical and n-gram based MT evaluation metrics, the aim of semantic MT evaluation is to measure the utility of translations. We discuss the design of empirical studies to evaluate the utility of machine translation output by assessing the accuracy for key semantic roles. These roles are from the English 5W templates (who, what, when, where, why) used in recent GALE distillation evaluations. Recent work by Wu and Fung (2009) introduced semantic role labeling into statistical machine translation to enhance the quality of MT output. However, this approach has so far only been evaluated using lexical and n-gram based SMT evaluation metrics like BLEU which are not aimed at evaluating the utility of MT output. Direct data analysis are still needed to understand how semantic models can be leveraged to evaluate the utility of MT output. In this paper, we discuss a new methodology for evaluating the utility of the machine translation output, by assessing the accuracy with which human readers are able to complete the English 5W templates.

Word Alignment with Stochastic Bracketing Linear Inversion Transduction Grammar
Markus Saers | Joakim Nivre | Dekai Wu
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Linear Inversion Transduction Grammar Alignments as a Second Translation Path
Markus Saers | Joakim Nivre | Dekai Wu
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation
Dekai Wu
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

A Systematic Comparison between Inversion Transduction Grammar and Linear Transduction Grammar for Word Alignment
Markus Saers | Joakim Nivre | Dekai Wu
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

Semantic vs. Syntactic vs. N-gram Structure for Machine Translation Evaluation
Chi-kiu Lo | Dekai Wu
Proceedings of the 4th Workshop on Syntax and Structure in Statistical Translation

2009

Can Semantic Role Labeling Improve SMT?
Dekai Wu | Pascale Fung
Proceedings of the 13th Annual Conference of the European Association for Machine Translation

Semantic Roles for SMT: A Hybrid Two-Pass Model
Dekai Wu | Pascale Fung
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers

Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009
Dekai Wu | David Chiang
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

Improving Phrase-Based Translation via Word Alignments from Stochastic Inversion Transduction Grammars
Markus Saers | Dekai Wu
Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation (SSST-3) at NAACL HLT 2009

Learning Stochastic Bracketing Inversion Transduction Grammars with a Cubic Time Biparsing Algorithm
Markus Saers | Joakim Nivre | Dekai Wu
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

Empirical lower bounds on translation unit error rate for the full class of inversion transduction grammars
Anders Søgaard | Dekai Wu
Proceedings of the 11th International Conference on Parsing Technologies (IWPT’09)

2008

Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation
Marine Carpuat | Dekai Wu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We present new direct data analysis showing that dynamically-built context-dependent phrasal translation lexicons are more useful resources for phrase-based statistical machine translation (SMT) than conventional static phrasal translation lexicons, which ignore all contextual information. After several years of surprising negative results, recent work suggests that context-dependent phrasal translation lexicons are an appropriate framework to successfully incorporate Word Sense Disambiguation (WSD) modeling into SMT. However, this approach has so far only been evaluated using automatic translation quality metrics, which are important, but aggregate many different factors. A direct analysis is still needed to understand how context-dependent phrasal translation lexicons impact translation quality, and whether the additional complexity they introduce is really necessary. In this paper, we focus on the impact of context-dependent translation lexicons on lexical choice in phrase-based SMT and show that context-dependent lexicons are more useful to a phrase-based SMT system than a conventional lexicon. A typical phrase-based SMT system makes use of more and longer phrases with context modeling, including phrases that were not seen very frequently in training. Even when the segmentation is identical, the context-dependent lexicons yield translations that match references more often than conventional lexicons.

Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)
David Chiang | Dekai Wu
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

2007

HKUST statistical machine translation experiments for IWSLT 2007
Yihai Shen | Chi-kiu Lo | Marine Carpuat | Dekai Wu
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper describes the HKUST experiments in the IWSLT 2007 evaluation campaign on spoken language translation. Our primary objective was to compare the open-source phrase-based statistical machine translation toolkit Moses against Pharaoh. We focused on Chinese to English translation, but we also report results on the Arabic to English, Italian to English, and Japanese to English tasks.

Context-dependent phrasal translation lexicons for statistical machine translation
Marine Carpuat | Dekai Wu
Proceedings of Machine Translation Summit XI: Papers

How phrase sense disambiguation outperforms word sense disambiguation for statistical machine translation
Marine Carpuat | Dekai Wu
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

Learning bilingual semantic frames: shallow semantic parsing vs. semantic role projection
Pascale Fung | Zhaojun Wu | Yongsheng Yang | Dekai Wu
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

Improving Statistical Machine Translation Using Word Sense Disambiguation
Marine Carpuat | Dekai Wu
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation
Dekai Wu | David Chiang
Proceedings of SSST, NAACL-HLT 2007 / AMTA Workshop on Syntax and Structure in Statistical Translation

2006

Toward integrating word sense and entity disambiguation into statistical machine translation
Marine Carpuat | Yihai Shen | Xiaofeng Yu | Dekai Wu
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

A Grammatical Approach to Understanding Textual Tables Using Two-Dimensional SCFGs
Dekai Wu | Ken Wing Kuen Lee
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions

Boosting for Chinese Named Entity Recognition
Xiaofeng Yu | Marine Carpuat | Dekai Wu
Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing

2005

Inversion Transduction Grammar Constraints for Mining Parallel Sentences from Quasi-Comparable Corpora
Dekai Wu | Pascale Fung
Second International Joint Conference on Natural Language Processing: Full Papers

Evaluating the Word Sense Disambiguation Performance of Statistical Machine Translation
Marine Carpuat | Dekai Wu
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

Statistical Machine Translation Part II: Tree-Based SMT
Dekai Wu
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

Word Sense Disambiguation vs. Statistical Machine Translation
Marine Carpuat | Dekai Wu
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

Recognizing Paraphrases and Textual Entailment Using Inversion Transduction Grammars
Dekai Wu
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment

2004

Why Nitpicking Works: Evidence for Occam’s Razor in Error Correctors
Dekai Wu | Grace Ngai | Marine Carpuat
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Semi-supervised training of a Kernel PCA-Based Model for Word Sense Disambiguation
Weifeng Su | Marine Carpuat | Dekai Wu
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

Raising the Bar: Stacked Conservative Error Correction Beyond Boosting
Dekai Wu | Grace Ngai | Marine Carpuat
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Using N-best lists for Named Entity Recognition from Chinese Speech
Lufeng Zhai | Pascale Fung | Richard Schwartz | Marine Carpuat | Dekai Wu
Proceedings of HLT-NAACL 2004: Short Papers

A Kernel PCA Method for Superior Word Sense Disambiguation
Dekai Wu | Weifeng Su | Marine Carpuat
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

An Efficient Algorithm to Induce Minimum Average Lookahead Grammars for Incremental LR Parsing
Dekai Wu | Yihai Shen
Proceedings of the Workshop on Incremental Parsing: Bringing Engineering and Cognition Together

Augmenting ensemble classification for Word Sense Disambiguation with a kernel PCA model
Marine Carpuat | Weifeng Su | Dekai Wu
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

Semantic role labeling with Boosting, SVMs, Maximum Entropy, SNOW, and Decision Lists
Grace Ngai | Dekai Wu | Marine Carpuat | Chi-Shing Wang | Chi-Yung Wang
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

Joining forces to resolve lexical ambiguity: East meets West in Barcelona
Richard Wicentowski | Grace Ngai | Dekai Wu | Marine Carpuat | Emily Thomforde | Adrian Packel
Proceedings of SENSEVAL-3, the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text

Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing
Dekang Lin | Dekai Wu
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

The HKUST leading question translation system
Dekai Wu
Proceedings of Machine Translation Summit IX: Plenaries

A Stacked, Voted, Stacked Model for Named Entity Recognition
Dekai Wu | Grace Ngai | Marine Carpuat
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

2002

Boosting for Named Entity Recognition
Dekai Wu | Grace Ngai | Marine Carpuat | Jeppe Larsen | Yongsheng Yang
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

2000

An Information-Theory-Based Feature Type Analysis for the Modeling of Statistical Parsing
Zhifang Sui | Jun Zhao | Dekai Wu
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

1999

An Information-Theoretic Empirical Analysis of Dependency-Based Feature Types for Word Prediction Models
Dekai Wu | Jun Zhao | Zhifang Sui
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

Automatically Merging Lexicons that have Incompatible Part-of-Speech Categories
Daniel Ka-Leung Chan | Dekai Wu
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

Machine Translation with a Stochastic Grammatical Channel
Dekai Wu | Hongsing Wong
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

Machine Translation with a Stochastic Grammatical Channel
Dekai Wu | Hongsing Wong
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
Dekai Wu
Computational Linguistics, Volume 23, Number 3, September 1997

Dealing with Multilinguality in a Spoken Language Query Translator
Pascale Fung | Bertram Shi | Dekai Wu | Lain Wai Bun | Wong Shuen Kong
Spoken Language Translation

1996

Panel: Next steps in MT research
Lynn Carlson | Jaime Carbonell | David Farwell | Pierre Isabelle | Jackie Murgida | John O’Hara | Dekai Wu
Conference of the Association for Machine Translation in the Americas

A Polynomial-Time Algorithm for Statistical Machine Translation
Dekai Wu
34th Annual Meeting of the Association for Computational Linguistics

Parsing Chinese With an Almost-Context-Free Grammar
Xuanyin Xia | Dekai Wu
Conference on Empirical Methods in Natural Language Processing

1995

Coerced Markov Models for Cross-Lingual Lexical-Tag Relations
Pascale Fung | Dekai Wu
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

Grammarless Extraction of Phrasal Translation Examples from Parallel Texts
Dekai Wu
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages

An Algorithm for Simultaneously Bracketing Parallel Texts by Aligning Words
Dekai Wu
33rd Annual Meeting of the Association for Computational Linguistics

Trainable Coarse Bilingual Grammars for Parallel Text Bracketing
Dekai Wu
Third Workshop on Very Large Corpora

Using Brackets to Improve Search for Statistical Machine Translation
Dekai Wu | Cindy Ng
Proceedings of the 10th Pacific Asia Conference on Language, Information and Computation

1994

Learning an English-Chinese Lexicon from a Parallel Corpus
Dekai Wu | Xuanyin Xia
Proceedings of the First Conference of the Association for Machine Translation in the Americas

Statistical Augmentation of a Chinese Machine-Readable Dictionary
Pascale Fung | Dekai Wu
Second Workshop on Very Large Corpora

We describe a method of using statistically-collected Chinese character groups from a corpus to augment a Chinese dictionary. The method is particularly useful for extracting domain-specific and regional words not readily available in machine-readable dictionaries. Output was evaluated both using human evaluators and against a previously available dictionary. We also evaluated performance improvement in automatic Chinese tokenization. Results show that our method outputs legitimate words, acronymic constructions, idioms, names and titles, as well as technical compounds, many of which were lacking from the original dictionary.

Improving Chinese Tokenization With Linguistic Filters on Statistical Lexical Acquisition
Dekai Wu | Pascale Fung
Fourth Conference on Applied Natural Language Processing

Book Reviews: Statistically-Driven Computer Grammars of English: The IBM/Lancaster Approach
Dekai Wu
Computational Linguistics, Volume 20, Number 3, September 1994

Aligning a Parallel English-Chinese Corpus Statistically With Lexical Criteria
Dekai Wu
32nd Annual Meeting of the Association for Computational Linguistics

1990

Probabilistic Unification-Based Integration Of Syntactic and Semantic Preferences For Nominal Compounds
Dekai Wu
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

1988

The Berkeley Unix Consultant Project
Robert Wilensky | David N. Chin | Marc Luria | James Martin | James Mayfield | Dekai Wu
Computational Linguistics, Volume 14, Number 4, December 1988, LFP: A Logic for Linguistic Descriptions and an Analysis of its Complexity

Co-authors

Meriem Beloucif 9

Anand Karthik Tumuluru 2

Hongsing Wong 2

Yongsheng Yang 2

Marianna Apidianaki 1

Nora Aranberri 1

Pushpak Bhattacharyya 1

Ondřej Bojar 1

Jaime G. Carbonell 1

Xavier Carreras 1

Daniel Ka-Leung Chan 1

David N. Chin 1

Philipp Dowling 1

David Farwell 1

Marcello Federico 1

Pierre Isabelle 1

Wong Shuen Kong 1

Serkan Kumyol 1

Ken Wing Kuen Lee 1

Joseph Mariani 1

James H. Martin 1

James Mayfield 1

Jackie Murgida 1

Satoshi Nakamura 1

John O’Hara 1

Adrian Packel 1

Emmanuel Prochasson 1

Richard Schwartz 1

Sebastian Stüker 1

Anders Søgaard 1

Emily Thomforde 1

Eva Maria Vecchi 1

Chi-Shing Wang 1

Chi-Yung Wang 1

Richard Wicentowski 1

Robert Wilensky 1

François Yvon 1

Venues