Syntactic SMT and Semantic SMT

Dekai Wu


Abstract
Over the past twenty years, we have attacked the historical methodological barriers between statistical machine translation and traditional models of syntax, semantics, and structure. In this tutorial, we will survey some of the central issues and techniques from each of these aspects, with an emphasis on `deeply theoretically integrated' models, rather than hybrid approaches such as superficial statistical aggregation or system combination of outputs produced by traditional symbolic components. On syntactic SMT, we will explore the trade-offs for SMT between learnability and representational expressiveness. After establishing a foundation in the theory and practice of stochastic transduction grammars, we will examine very recent new approaches to automatic unsupervised induction of various classes of transduction grammars. We will show why stochastic linear transduction grammars (LTGs and LITGs) and their preterminalized variants (PLITGs) are proving to be particularly intriguing models for the bootstrapping of inducing full-fledged stochastic inversion transduction grammars (ITGs). On semantic SMT, we will explore the trade-offs for SMT involved in applying various lexical semantics models. We will first examine word sense disambiguation, and discuss why traditional WSD models that are not deeply integrated within the SMT model tend, surprisingly, to fail. In contrast, we will show how a deeply embedded phrase sense disambiguation (PSD) approach succeeds where traditional WSD does not. We will then turn to semantic role labeling, and discuss the challenges of early approaches of applying SRL models to SMT. Finally, on semantic MT evaluation, we will explore some very new human and semi-automatic metrics based on semantic frame agreement. We show that by keeping the metrics deeply grounded within the theoretical framework of semantic frames, the new HMEANT and MEANT metrics can significantly outperform even the state-of-the-art expensive HTER and TER metrics, while at the same time maintaining the desirable characteristics of simplicity, inexpensiveness, and representational transparency.
Anthology ID:
2011.mtsummit-tutorials.1
Volume:
Proceedings of Machine Translation Summit XIII: Tutorial Abstracts
Month:
September 19
Year:
2011
Address:
Xiamen, China
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
Language:
URL:
https://aclanthology.org/2011.mtsummit-tutorials.1
DOI:
Bibkey:
Cite (ACL):
Dekai Wu. 2011. Syntactic SMT and Semantic SMT. In Proceedings of Machine Translation Summit XIII: Tutorial Abstracts, Xiamen, China.
Cite (Informal):
Syntactic SMT and Semantic SMT (Wu, MTSummit 2011)
Copy Citation:
PDF:
https://aclanthology.org/2011.mtsummit-tutorials.1.pdf