Katsuhito Sudoh


2021

pdf bib
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
Toshiaki Nakazawa | Hideki Nakayama | Isao Goto | Hideya Mino | Chenchen Ding | Raj Dabre | Anoop Kunchukuttan | Shohei Higashiyama | Hiroshi Manabe | Win Pa Pa | Shantipriya Parida | Ondřej Bojar | Chenhui Chu | Akiko Eriguchi | Kaori Abe | Yusuke Oda | Katsuhito Sudoh | Sadao Kurohashi | Pushpak Bhattacharyya
Proceedings of the 8th Workshop on Asian Translation (WAT2021)

pdf bib
Is This Translation Error Critical?: Classification-Based Human and Automatic Machine Translation Evaluation Focusing on Critical Errors
Katsuhito Sudoh | Kosuke Takahashi | Satoshi Nakamura
Proceedings of the Workshop on Human Evaluation of NLP Systems (HumEval)

This paper discusses a classification-based approach to machine translation evaluation, as opposed to a common regression-based approach in the WMT Metrics task. Recent machine translation usually works well but sometimes makes critical errors due to just a few wrong word choices. Our classification-based approach focuses on such errors using several error type labels, for practical machine translation evaluation in an age of neural machine translation. We made additional annotations on the WMT 2015-2017 Metrics datasets with fluency and adequacy labels to distinguish different types of translation errors from syntactic and semantic viewpoints. We present our human evaluation criteria for the corpus development and automatic evaluation experiments using the corpus. The human evaluation corpus will be publicly available upon publication.

pdf bib
FINDINGS OF THE IWSLT 2021 EVALUATION CAMPAIGN
Antonios Anastasopoulos | Ondřej Bojar | Jacob Bremerman | Roldano Cattoni | Maha Elbayad | Marcello Federico | Xutai Ma | Satoshi Nakamura | Matteo Negri | Jan Niehues | Juan Pino | Elizabeth Salesky | Sebastian Stüker | Katsuhito Sudoh | Marco Turchi | Alexander Waibel | Changhan Wang | Matthew Wiesner
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation. A total of 22 teams participated in at least one of the tasks. This paper describes each shared task, data and evaluation metrics, and reports results of the received submissions.

pdf bib
NAIST English-to-Japanese Simultaneous Translation System for IWSLT 2021 Simultaneous Text-to-text Task
Ryo Fukuda | Yui Oka | Yasumasa Kano | Yuki Yano | Yuka Ko | Hirotaka Tokuyama | Kosuke Doi | Sakriani Sakti | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes NAIST’s system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign. Our primary submission is based on wait-k neural machine translation with sequence-level knowledge distillation to encourage literal translation.

pdf bib
On Knowledge Distillation for Translating Erroneous Speech Transcriptions
Ryo Fukuda | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

Recent studies argue that knowledge distillation is promising for speech translation (ST) using end-to-end models. In this work, we investigate the effect of knowledge distillation with a cascade ST using automatic speech recognition (ASR) and machine translation (MT) models. We distill knowledge from a teacher model based on human transcripts to a student model based on erroneous transcriptions. Our experimental results demonstrated that knowledge distillation is beneficial for a cascade ST. Further investigation that combined knowledge distillation and fine-tuning revealed that the combination consistently improved two language pairs: English-Italian and Spanish-English.

pdf bib
Large-Scale English-Japanese Simultaneous Interpretation Corpus: Construction and Analyses with Sentence-Aligned Data
Kosuke Doi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)

This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis. A portion of the corpus contains SI data from three interpreters with different amounts of experience. Some of the SI data were manually aligned with the source speeches at the sentence level. Their latency, quality, and word order aspects were compared among the SI data themselves as well as against offline translations. The results showed that (1) interpreters with more experience controlled the latency and quality better, and (2) large latency hurt the SI quality.

pdf bib
ARTA: Collection and Classification of Ambiguous Requests and Thoughtful Actions
Shohei Tanaka | Koichiro Yoshino | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Human-assisting systems such as dialogue systems must take thoughtful, appropriate actions not only for clear and unambiguous user requests, but also for ambiguous user requests, even if the users themselves are not aware of their potential requirements. To construct such a dialogue agent, we collected a corpus and developed a model that classifies ambiguous user requests into corresponding system actions. In order to collect a high-quality corpus, we asked workers to input antecedent user requests whose pre-defined actions could be regarded as thoughtful. Although multiple actions could be identified as thoughtful for a single user request, annotating all combinations of user requests and system actions is impractical. For this reason, we fully annotated only the test data and left the annotation of the training data incomplete. In order to train the classification model on such training data, we applied the positive/unlabeled (PU) learning method, which assumes that only a part of the data is labeled with positive examples. The experimental results show that the PU learning method achieved better performance than the general positive/negative (PN) learning method to classify thoughtful actions given an ambiguous user request.

2020

pdf bib
Automatic Machine Translation Evaluation using Source Language Inputs and Cross-lingual Language Model
Kosuke Takahashi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We propose an automatic evaluation method of machine translation that uses source language sentences regarded as additional pseudo references. The proposed method evaluates a translation hypothesis in a regression model. The model takes the paired source, reference, and hypothesis sentence all together as an input. A pretrained large scale cross-lingual language model encodes the input to sentence-pair vectors, and the model predicts a human evaluation score with those vectors. Our experiments show that our proposed method using Cross-lingual Language Model (XLM) trained with a translation language modeling (TLM) objective achieves a higher correlation with human judgments than a baseline method that uses only hypothesis and reference sentences. Additionally, using source sentences in our proposed method is confirmed to improve the evaluation performance.

pdf bib
Reflection-based Word Attribute Transfer
Yoichi Ishibashi | Katsuhito Sudoh | Koichiro Yoshino | Satoshi Nakamura
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

Word embeddings, which often represent such analogic relations as king - man + woman queen, can be used to change a word’s attribute, including its gender. For transferring king into queen in this analogy-based manner, we subtract a difference vector man - woman based on the knowledge that king is male. However, developing such knowledge is very costly for words and attributes. In this work, we propose a novel method for word attribute transfer based on reflection mappings without such an analogy operation. Experimental results show that our proposed method can transfer the word attributes of the given words without changing the words that do not have the target attributes.

pdf bib
Improving Spoken Language Understanding by Wisdom of Crowds
Koichiro Yoshino | Kana Ikeuchi | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 28th International Conference on Computational Linguistics

Spoken language understanding (SLU), which converts user requests in natural language to machine-interpretable expressions, is becoming an essential task. The lack of training data is an important problem, especially for new system tasks, because existing SLU systems are based on statistical approaches. In this paper, we proposed to use two sources of the “wisdom of crowds,” crowdsourcing and knowledge community website, for improving the SLU system. We firstly collected paraphrasing variations for new system tasks through crowdsourcing as seed data, and then augmented them using similar questions from a knowledge community website. We investigated the effects of the proposed data augmentation method in SLU task, even with small seed data. In particular, the proposed architecture augmented more than 120,000 samples to improve SLU accuracies.

pdf bib
Incorporating Noisy Length Constraints into Transformer with Length-aware Positional Encodings
Yui Oka | Katsuki Chousa | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 28th International Conference on Computational Linguistics

Neural Machine Translation often suffers from an under-translation problem due to its limited modeling of output sequence lengths. In this work, we propose a novel approach to training a Transformer model using length constraints based on length-aware positional encoding (PE). Since length constraints with exact target sentence lengths degrade translation performance, we add random noise within a certain window size to the length constraints in the PE during the training. In the inference step, we predict the output lengths using input sequences and a BERT-based length prediction model. Experimental results in an ASPEC English-to-Japanese translation showed the proposed method produced translations with lengths close to the reference ones and outperformed a vanilla Transformer (especially in short sentences) by 3.22 points in BLEU. The average translation results using our length prediction model were also better than another baseline method using input lengths for the length constraints. The proposed noise injection improved robustness for length prediction errors, especially within the window size.

pdf bib
Proceedings of the 7th Workshop on Asian Translation
Toshiaki Nakazawa | Hideki Nakayama | Chenchen Ding | Raj Dabre | Anoop Kunchukuttan | Win Pa Pa | Ondřej Bojar | Shantipriya Parida | Isao Goto | Hidaya Mino | Hiroshi Manabe | Katsuhito Sudoh | Sadao Kurohashi | Pushpak Bhattacharyya
Proceedings of the 7th Workshop on Asian Translation

pdf bib
NAIST’s Machine Translation Systems for IWSLT 2020 Conversational Speech Translation Task
Ryo Fukuda | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes NAIST’s NMT system submitted to the IWSLT 2020 conversational speech translation task. We focus on the translation disfluent speech transcripts that include ASR errors and non-grammatical utterances. We tried a domain adaptation method by transferring the styles of out-of-domain data (United Nations Parallel Corpus) to be like in-domain data (Fisher transcripts). Our system results showed that the NMT model with domain adaptation outperformed a baseline. In addition, slight improvement by the style transfer was observed.

2019

pdf bib
Proceedings of the 3rd Workshop on Neural Generation and Translation
Alexandra Birch | Andrew Finch | Hiroaki Hayashi | Ioannis Konstas | Thang Luong | Graham Neubig | Yusuke Oda | Katsuhito Sudoh
Proceedings of the 3rd Workshop on Neural Generation and Translation

pdf bib
Findings of the Third Workshop on Neural Generation and Translation
Hiroaki Hayashi | Yusuke Oda | Alexandra Birch | Ioannis Konstas | Andrew Finch | Minh-Thang Luong | Graham Neubig | Katsuhito Sudoh
Proceedings of the 3rd Workshop on Neural Generation and Translation

This document describes the findings of the Third Workshop on Neural Generation and Translation, held in concert with the annual conference of the Empirical Methods in Natural Language Processing (EMNLP 2019). First, we summarize the research trends of papers presented in the proceedings. Second, we describe the results of the two shared tasks 1) efficient neural machine translation (NMT) where participants were tasked with creating NMT systems that are both accurate and efficient, and 2) document generation and translation (DGT) where participants were tasked with developing systems that generate summaries from structured data, potentially with assistance from text in another language.

pdf bib
Conversational Response Re-ranking Based on Event Causality and Role Factored Tensor Event Embedding
Shohei Tanaka | Koichiro Yoshino | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the First Workshop on NLP for Conversational AI

We propose a novel method for selecting coherent and diverse responses for a given dialogue context. The proposed method re-ranks response candidates generated from conversational models by using event causality relations between events in a dialogue history and response candidates (e.g., “be stressed out” precedes “relieve stress”). We use distributed event representation based on the Role Factored Tensor Model for a robust matching of event causality relations due to limited event causality knowledge of the system. Experimental results showed that the proposed method improved coherency and dialogue continuity of system responses.

pdf bib
Proceedings of The 8th Workshop on Patent and Scientific Literature Translation
Takehito Utsuro | Katsuhito Sudoh | Takashi Tsunakawa
Proceedings of The 8th Workshop on Patent and Scientific Literature Translation

2018

pdf bib
Multi-Source Neural Machine Translation with Missing Data
Yuta Nishimura | Katsuhito Sudoh | Graham Neubig | Satoshi Nakamura
Proceedings of the 2nd Workshop on Neural Machine Translation and Generation

Multi-source translation is an approach to exploit multiple inputs (e.g. in two different languages) to increase translation accuracy. In this paper, we examine approaches for multi-source neural machine translation (NMT) using an incomplete multilingual corpus in which some translations are missing. In practice, many multilingual corpora are not complete due to the difficulty to provide translations in all of the relevant languages (for example, in TED talks, most English talks only have subtitles for a small portion of the languages that TED supports). Existing studies on multi-source translation did not explicitly handle such situations. This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol <NULL>. These methods allow us to use incomplete corpora both at training time and test time. In experiments with real incomplete multilingual corpora of TED Talks, the multi-source NMT with the <NULL> tokens achieved higher translation accuracies measured by BLEU than those by any one-to-one NMT systems.

pdf bib
Overview of the 5th Workshop on Asian Translation
Toshiaki Nakazawa | Katsuhito Sudoh | Shohei Higashiyama | Chenchen Ding | Raj Dabre | Hideya Mino | Isao Goto | Win Pa Pa | Anoop Kunchukuttan | Sadao Kurohashi
Proceedings of the 32nd Pacific Asia Conference on Language, Information and Computation: 5th Workshop on Asian Translation: 5th Workshop on Asian Translation

2017

pdf bib
An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation
Makoto Morishita | Yusuke Oda | Graham Neubig | Koichiro Yoshino | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the First Workshop on Neural Machine Translation

Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the mini-batched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essential step in NMT training, widely used NMT toolkits implement disparate strategies for doing so, which have not been empirically validated or compared. This work investigates mini-batch creation strategies with experiments over two different datasets. Our results suggest that the choice of a mini-batch creation strategy has a large effect on NMT training and some length-based sorting strategies do not always work well compared with simple shuffling.

pdf bib
Tree as a Pivot: Syntactic Matching Methods in Pivot Translation
Akiva Miura | Graham Neubig | Katsuhito Sudoh | Satoshi Nakamura
Proceedings of the Second Conference on Machine Translation

pdf bib
A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task
Yusuke Oda | Katsuhito Sudoh | Satoshi Nakamura | Masao Utiyama | Eiichiro Sumita
Proceedings of the 4th Workshop on Asian Translation (WAT2017)

This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task. The system consists of a language-independent tokenizer and an attentional encoder-decoder style neural machine translation model. According to the official results, our system achieves higher translation accuracy than any systems submitted previous campaigns despite simple model architecture.

2016

pdf bib
Neural Reordering Model Considering Phrase Translation and Word Alignment for Phrase-based Translation
Shin Kanouchi | Katsuhito Sudoh | Mamoru Komachi
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper presents an improved lexicalized reordering model for phrase-based statistical machine translation using a deep neural network. Lexicalized reordering suffers from reordering ambiguity, data sparseness and noises in a phrase table. Previous neural reordering model is successful to solve the first and second problems but fails to address the third one. Therefore, we propose new features using phrase translation and word alignment to construct phrase vectors to handle inherently noisy phrase translation pairs. The experimental results show that our proposed method improves the accuracy of phrase reordering. We confirm that the proposed method works well with phrase pairs including NULL alignments.

pdf bib
Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering for WAT 2016
Katsuhito Sudoh | Masaaki Nagata
Proceedings of the 3rd Workshop on Asian Translation (WAT2016)

This paper presents our Chinese-to-Japanese patent machine translation system for WAT 2016 (Group ID: ntt) that uses syntactic pre-ordering over Chinese dependency structures. Chinese words are reordered by a learning-to-rank model based on pairwise classification to obtain word order close to Japanese. In this year’s system, two different machine translation methods are compared: traditional phrase-based statistical machine translation and recent sequence-to-sequence neural machine translation with an attention mechanism. Our pre-ordering showed a significant improvement over the phrase-based baseline, but, in contrast, it degraded the neural machine translation baseline.

pdf bib
Exploring Text Links for Coherent Multi-Document Summarization
Xun Wang | Masaaki Nishino | Tsutomu Hirao | Katsuhito Sudoh | Masaaki Nagata
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Summarization aims to represent source documents by a shortened passage. Existing methods focus on the extraction of key information, but often neglect coherence. Hence the generated summaries suffer from a lack of readability. To address this problem, we have developed a graph-based method by exploring the links between text to produce coherent summaries. Our approach involves finding a sequence of sentences that best represent the key information in a coherent way. In contrast to the previous methods that focus only on salience, the proposed method addresses both coherence and informativeness based on textual linkages. We conduct experiments on the DUC2004 summarization task data set. A performance comparison reveals that the summaries generated by the proposed system achieve comparable results in terms of the ROUGE metric, and show improvements in readability by human evaluation.

2015

pdf bib
Empty Category Detection With Joint Context-Label Embeddings
Xun Wang | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Discriminative Preordering Meets Kendall’s 𝜏 Maximization
Sho Hoshino | Yusuke Miyao | Katsuhito Sudoh | Katsuhiko Hayashi | Masaaki Nagata
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

pdf bib
Chinese-to-Japanese Patent Machine Translation based on Syntactic Pre-ordering forWAT 2015
Katsuhito Sudoh | Masaaki Nagata
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

2014

pdf bib
NTT-NAIST syntax-based SMT systems for IWSLT 2014
Katsuhito Sudoh | Graham Neubig | Kevin Duh | Katsuhiko Hayashi
Proceedings of the 11th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper presents NTT-NAIST SMT systems for English-German and German-English MT tasks of the IWSLT 2014 evaluation campaign. The systems are based on generalized minimum Bayes risk system combination of three SMT systems using the forest-to-string, syntactic preordering, and phrase-based translation formalisms. Individual systems employ training data selection for domain adaptation, truecasing, compound word splitting (for GermanEnglish), interpolated n-gram language models, and hypotheses rescoring using recurrent neural network language models.

pdf bib
Japanese-to-English patent translation system based on domain-adapted word segmentation and post-ordering
Katsuhito Sudoh | Masaaki Nagata | Shinsuke Mori | Tatsuya Kawahara
Proceedings of the 11th Conference of the Association for Machine Translation in the Americas: MT Researchers Track

This paper presents a Japanese-to-English statistical machine translation system specialized for patent translation. Patents are practically useful technical documents, but their translation needs different efforts from general-purpose translation. There are two important problems in the Japanese-to-English patent translation: long distance reordering and lexical translation of many domain-specific terms. We integrated novel lexical translation of domain-specific terms with a syntax-based post-ordering framework that divides the machine translation problem into lexical translation and reordering explicitly for efficient syntax-based translation. The proposed lexical translation consists of a domain-adapted word segmentation and an unknown word transliteration. Experimental results show our system achieves better translation accuracy in BLEU and TER compared to the baseline methods.

2013

pdf bib
Two-Stage Pre-ordering for Japanese-to-English Statistical Machine Translation
Sho Hoshino | Yusuke Miyao | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora
Katsuhito Sudoh | Shinsuke Mori | Masaaki Nagata
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Shift-Reduce Word Reordering for Machine Translation
Katsuhiko Hayashi | Katsuhito Sudoh | Hajime Tsukada | Jun Suzuki | Masaaki Nagata
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing

pdf bib
Effects of Parsing Errors on Pre-Reordering Performance for Chinese-to-Japanese SMT
Dan Han | Pascual Martínez-Gómez | Yusuke Miyao | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the 27th Pacific Asia Conference on Language, Information, and Computation (PACLIC 27)

pdf bib
Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
Kevin Duh | Graham Neubig | Katsuhito Sudoh | Hajime Tsukada
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
NTT-NAIST SMT systems for IWSLT 2013
Katsuhito Sudoh | Graham Neubig | Kevin Duh | Hajime Tsukada
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper presents NTT-NAIST SMT systems for English-German and German-English MT tasks of the IWSLT 2013 evaluation campaign. The systems are based on generalized minimum Bayes risk system combination of three SMT systems: forest-to-string, hierarchical phrase-based, phrasebased with pre-ordering. Individual SMT systems include data selection for domain adaptation, rescoring using recurrent neural net language models, interpolated language models, and compound word splitting (only for German-English).

pdf bib
Using unlabeled dependency parsing for pre-reordering for Chinese-to-Japanese statistical machine translation
Dan Han | Pascual Martínez-Gómez | Yusuke Miyao | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the Second Workshop on Hybrid Approaches to Translation

2012

pdf bib
Learning to Translate with Multiple Objectives
Kevin Duh | Katsuhito Sudoh | Xianchao Wu | Hajime Tsukada | Masaaki Nagata
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
A Comparative Study of Target Dependency Structures for Statistical Machine Translation
Xianchao Wu | Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Head Finalization Reordering for Chinese-to-Japanese Machine Translation
Dan Han | Katsuhito Sudoh | Xianchao Wu | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf bib
Zero Pronoun Resolution can Improve the Quality of J-E Translation
Hirotoshi Taira | Katsuhito Sudoh | Masaaki Nagata
Proceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation

2011

pdf bib
Extracting Pre-ordering Rules from Predicate-Argument Structures
Xianchao Wu | Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Generalized Minimum Bayes Risk System Combination
Kevin Duh | Katsuhito Sudoh | Xianchao Wu | Hajime Tsukada | Masaaki Nagata
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Alignment Inference and Bayesian Adaptation for Machine Translation
Kevin Duh | Katsuhito Sudoh | Tomoharu Iwata | Hajime Tsukada
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Extracting Pre-ordering Rules from Chunk-based Dependency Trees for Japanese-to-English Translation
Xianchao Wu | Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of Machine Translation Summit XIII: Papers

pdf bib
Post-ordering in Statistical Machine Translation
Katsuhito Sudoh | Xianchao Wu | Kevin Duh | Hajime Tsukada | Masaaki Nagata
Proceedings of Machine Translation Summit XIII: Papers

2010

pdf bib
NTT statistical MT system for IWSLT 2010
Katsuhito Sudoh | Kevin Duh | Hajime Tsukada
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Analysis of translation model adaptation in statistical machine translation
Kevin Duh | Katsuhito Sudoh | Hajime Tsukada
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

pdf bib
Hierarchical Phrase-based Machine Translation with Word-based Reordering Model
Katsuhiko Hayashi | Hajime Tsukada | Katsuhito Sudoh | Kevin Duh | Seiichi Yamamoto
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

pdf bib
Automatic Evaluation of Translation Quality for Distant Language Pairs
Hideki Isozaki | Tsutomu Hirao | Kevin Duh | Katsuhito Sudoh | Hajime Tsukada
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Head Finalization: A Simple Reordering Rule for SOV Languages
Hideki Isozaki | Katsuhito Sudoh | Hajime Tsukada | Kevin Duh
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
N-Best Reranking by Multitask Learning
Kevin Duh | Katsuhito Sudoh | Hajime Tsukada | Hideki Isozaki | Masaaki Nagata
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

pdf bib
Divide and Translate: Improving Long Distance Reordering in Statistical Machine Translation
Katsuhito Sudoh | Kevin Duh | Hajime Tsukada | Tsutomu Hirao | Masaaki Nagata
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR

2008

pdf bib
NTT statistical machine translation system for IWSLT 2008.
Katsuhito Sudoh | Taro Watanabe | Jun Suzuki | Hajime Tsukada | Hideki Isozaki
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

The NTT Statistical Machine Translation System consists of two primary components: a statistical machine translation decoder and a reranker. The decoder generates k-best translation canditates using a hierarchical phrase-based translation based on synchronous context-free grammar. The decoder employs a linear feature combination among several real-valued scores on translation and language models. The reranker reorders the k-best translation candidates using Ranking SVMs with a large number of sparse features. This paper describes the two components and presents the results for the evaluation campaign of IWSLT 2008.

2007

pdf bib
Larger feature set approach for machine translation in IWSLT 2007
Taro Watanabe | Jun Suzuki | Katsuhito Sudoh | Hajime Tsukada | Hideki Isozaki
Proceedings of the Fourth International Workshop on Spoken Language Translation

The NTT Statistical Machine Translation System employs a large number of feature functions. First, k-best translation candidates are generated by an efficient decoding method of hierarchical phrase-based translation. Second, the k-best translations are reranked. In both steps, sparse binary features — of the order of millions — are integrated during the search. This paper gives the details of the two steps and shows the results for the Evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT) 2007.

2006

pdf bib
Incorporating Speech Recognition Confidence into Discriminative Named Entity Recognition of Speech Data
Katsuhito Sudoh | Hajime Tsukada | Hideki Isozaki
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics