Hajime Tsukada


2024

pdf bib
Analysis on Unsupervised Acquisition Process of Bilingual Vocabulary through Iterative Back-Translation
Takuma Tanigawa | Tomoyosi Akiba | Hajime Tsukada
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we investigate how new bilingual vocabulary is acquired through Iterative Back-Translation (IBT), which is known as a data augmentation method for machine translation from monolingual data of both source and target languages. To reveal the acquisition process, we first identify the word translation pairs in test data that do not exist in a bilingual data but do only in two monolingual data, then observe how many pairs are successfully translated by the translation model trained through IBT. We experimented on it with domain adaptation settings on two language pairs. Our experimental evaluation showed that more than 60% of the new bilingual vocabulary is successfully acquired through IBT along with the improvement in the translation quality in terms of BLEU. It also revealed that new bilingual vocabulary was gradually acquired by repeating IBT iterations. From the results, we present our hypothesis on the process of new bilingual vocabulary acquisition where the context of the words plays a critical role in the success of the acquisition.

2015

pdf bib
Improvement of word alignment models for Vietnamese-to-English translation
Takahiro Nomura | Hajime Tsukada | Tomoyoshi Akiba
Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign

2013

pdf bib
Shift-Reduce Word Reordering for Machine Translation
Katsuhiko Hayashi | Katsuhito Sudoh | Hajime Tsukada | Jun Suzuki | Masaaki Nagata
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
Kevin Duh | Graham Neubig | Katsuhito Sudoh | Hajime Tsukada
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
NTT-NAIST SMT systems for IWSLT 2013
Katsuhito Sudoh | Graham Neubig | Kevin Duh | Hajime Tsukada
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper presents NTT-NAIST SMT systems for English-German and German-English MT tasks of the IWSLT 2013 evaluation campaign. The systems are based on generalized minimum Bayes risk system combination of three SMT systems: forest-to-string, hierarchical phrase-based, phrasebased with pre-ordering. Individual SMT systems include data selection for domain adaptation, rescoring using recurrent neural net language models, interpolated language models, and compound word splitting (only for German-English).

2012

pdf bib
Learning to Translate with Multiple Objectives
Kevin Duh | Katsuhito Sudoh | Xianchao Wu | Hajime Tsukada | Masaaki Nagata
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Comparative Study of Target Dependency Structures for Statistical Machine Translation
Xianchao Wu | Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Head Finalization Reordering for Chinese-to-Japanese Machine Translation
Dan Han | Katsuhito Sudoh | Xianchao Wu | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

2011

pdf bib
Automatic Error Analysis Based on Grammatical Questions
Tomoki Nagase | Hajime Tsukada | Katsunori Kotani | Nobutoshi Hatanaka | Yoshiyuki Sakamoto
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

pdf bib
Extracting Pre-ordering Rules from Predicate-Argument Structures
Xianchao Wu | Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Generalized Minimum Bayes Risk System Combination
Kevin Duh | Katsuhito Sudoh | Xianchao Wu | Hajime Tsukada | Masaaki Nagata
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Alignment Inference and Bayesian Adaptation for Machine Translation
Kevin Duh | Katsuhito Sudoh | Tomoharu Iwata | Hajime Tsukada
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Extracting Pre-ordering Rules from Chunk-based Dependency Trees for Japanese-to-English Translation
Xianchao Wu | Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Post-ordering in Statistical Machine Translation
Katsuhito Sudoh | Xianchao Wu | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
Automatic Evaluation of Translation Quality for Distant Language Pairs
Hideki Isozaki | Tsutomu Hirao | Kevin Duh | Katsuhito Sudoh | Hajime Tsukada
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
NTT statistical MT system for IWSLT 2010
Katsuhito Sudoh | Kevin Duh | Hajime Tsukada
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Analysis of translation model adaptation in statistical machine translation
Kevin Duh | Katsuhito Sudoh | Hajime Tsukada
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

pdf bib
Head Finalization: A Simple Reordering Rule for SOV Languages
Hideki Isozaki | Katsuhito Sudoh | Hajime Tsukada | Kevin Duh
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
N-Best Reranking by Multitask Learning
Kevin Duh | Katsuhito Sudoh | Hajime Tsukada | Hideki Isozaki | Masaaki Nagata
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Divide and Translate: Improving Long Distance Reordering in Statistical Machine Translation
Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Tsutomu Hirao | Masaaki Nagata
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Hierarchical Phrase-based Machine Translation with Word-based Reordering Model
Katsuhiko Hayashi | Hajime Tsukada | Katsuhito Sudoh | Kevin Duh | Seiichi Yamamoto
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2009

pdf bib
Structural support vector machines for log-linear approach in statistical machine translation
Katsuhiko Hayashi | Taro Watanabe | Hajime Tsukada | Hideki Isozaki
Proceedings of the 6th International Workshop on Spoken Language Translation: Papers

Minimum error rate training (MERT) is a widely used learning method for statistical machine translation. In this paper, we present a SVM-based training method to enhance generalization ability. We extend MERT optimization by maximizing the margin between the reference and incorrect translations under the L2-norm prior to avoid overfitting problem. Translation accuracy obtained by our proposed methods is more stable in various conditions than that obtained by MERT. Our experimental results on the French-English WMT08 shared task show that degrade of our proposed methods is smaller than that of MERT in case of small training data or out-of-domain test data.

pdf bib
A Succinct N-gram Language Model
Taro Watanabe | Hajime Tsukada | Hideki Isozaki
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

2008

pdf bib
NTT statistical machine translation system for IWSLT 2008.
Katsuhito Sudoh | Taro Watanabe | Jun Suzuki | Hajime Tsukada | Hideki Isozaki
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

The NTT Statistical Machine Translation System consists of two primary components: a statistical machine translation decoder and a reranker. The decoder generates k-best translation canditates using a hierarchical phrase-based translation based on synchronous context-free grammar. The decoder employs a linear feature combination among several real-valued scores on translation and language models. The reranker reorders the k-best translation candidates using Ranking SVMs with a large number of sparse features. This paper describes the two components and presents the results for the evaluation campaign of IWSLT 2008.

2007

pdf bib
Online Large-Margin Training for Statistical Machine Translation
Taro Watanabe | Jun Suzuki | Hajime Tsukada | Hideki Isozaki
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

pdf bib
Larger feature set approach for machine translation in IWSLT 2007
Taro Watanabe | Jun Suzuki | Katsuhito Sudoh | Hajime Tsukada | Hideki Isozaki
Proceedings of the Fourth International Workshop on Spoken Language Translation

The NTT Statistical Machine Translation System employs a large number of feature functions. First, k-best translation candidates are generated by an efficient decoding method of hierarchical phrase-based translation. Second, the k-best translations are reranked. In both steps, sparse binary features — of the order of millions — are integrated during the search. This paper gives the details of the two steps and shows the results for the Evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2007.

2006

pdf bib
Incorporating Speech Recognition Confidence into Discriminative Named Entity Recognition of Speech Data
Katsuhito Sudoh | Hajime Tsukada | Hideki Isozaki
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
Left-to-Right Target Generation for Hierarchical Phrase-Based Translation
Taro Watanabe | Hajime Tsukada | Hideki Isozaki
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf bib
NTT statistical machine translation for IWSLT 2006
Taro Watanabe | Jun Suzuki | Hajime Tsukada | Hideki Isozaki
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
NTT System Description for the WMT2006 Shared Task
Taro Watanabe | Hajime Tsukada | Hideki Isozaki
Proceedings on the Workshop on Statistical Machine Translation

2005

pdf bib
Instance-Based Generation for Interactive Restricted Domain Question Answering Systems
Matthias Denecke | Hajime Tsukada
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
The NTT Statistical Machine Translation System for IWSLT2005
Hajime Tsukada | Taro Watanabe | Jun Suzuki | Hideto Kazawa | Hideki Isozaki
Proceedings of the Second International Workshop on Spoken Language Translation

2004

pdf bib
Efficient Decoding for Statistical Machine Translation with a Fully Expanded WFST Model
Hajime Tsukada | Masaaki Nagata
Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing

2003

pdf bib
Spoken Interactive ODQA System: SPIQA
Chiori Hori | Takaaki Hori | Hajime Tsukada | Hideki Isozaki | Yutaka Sasaki | Eisaku Maeda
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics