Hideki Tanaka


2021

pdf bib
Field Experiments of Real Time Foreign News Distribution Powered by MT
Keiji Yasuda | Ichiro Yamada | Naoaki Okazak | Hideki Tanaka | Hidehiro Asaka | Takeshi Anzai | Fumiaki Sugaya
Proceedings of Machine Translation Summit XVIII: Users and Providers Track

Field experiments on a foreign news distribution system using two key technologies are reported. The first technology is a summarization component, which is used for generating news headlines. This component is a transformer-based abstractive text summarization system which is trained to output headlines from the leading sentences of news articles. The second technology is machine translation (MT), which enables users to read foreign news articles in their mother language. Since the system uses MT, users can immediately access the latest foreign news. 139 Japanese LINE users participated in the field experiments for two weeks, viewing about 40,000 articles which had been translated from English to Japanese. We carried out surveys both during and after the experiments. According to the results, 79.3% of users evaluated the headlines as adequate, while 74.7% of users evaluated the automatically translated articles as intelligible. According to the post-experiment survey, 59.7% of users wished to continue using the system; 11.5% of users did not. We also report several statistics of the experiments.

2020

pdf bib
Content-Equivalent Translated Parallel News Corpus and Extension of Domain Adaptation for NMT
Hideya Mino | Hideki Tanaka | Hitoshi Ito | Isao Goto | Ichiro Yamada | Takenobu Tokunaga
Proceedings of the 12th Language Resources and Evaluation Conference

In this paper, we deal with two problems in Japanese-English machine translation of news articles. The first problem is the quality of parallel corpora. Neural machine translation (NMT) systems suffer degraded performance when trained with noisy data. Because there is no clean Japanese-English parallel data for news articles, we build a novel parallel news corpus consisting of Japanese news articles translated into English in a content-equivalent manner. This is the first content-equivalent Japanese-English news corpus translated specifically for training NMT systems. The second problem involves the domain-adaptation technique. NMT systems suffer degraded performance when trained with mixed data having different features, such as noisy data and clean data. Though the existing methods try to overcome this problem by using tags for distinguishing the differences between corpora, it is not sufficient. We thus extend a domain-adaptation method using multi-tags to train an NMT model effectively with the clean corpus and existing parallel news corpora with some types of noise. Experimental results show that our corpus increases the translation quality, and that our domain-adaptation method is more effective for learning with the multiple types of corpora than existing domain-adaptation methods are.

pdf bib
Neural Machine Translation Using Extracted Context Based on Deep Analysis for the Japanese-English Newswire Task at WAT 2020
Isao Goto | Hideya Mino | Hitoshi Ito | Kazutaka Kinugawa | Ichiro Yamada | Hideki Tanaka
Proceedings of the 7th Workshop on Asian Translation

This paper describes the system of the NHK-NES team for the WAT 2020 Japanese–English newswire task. There are two main problems in Japanese-English news translation: translation of dropped subjects and compatibility between equivalent translations and English news-style outputs. We address these problems by extracting subjects from the context based on predicate-argument structures and using them as additional inputs, and constructing parallel Japanese-English news sentences equivalently translated from English news sentences. The evaluation results confirm the effectiveness of our context-utilization method.

2019

pdf bib
Neural Machine Translation System using a Content-equivalently Translated Parallel Corpus for the Newswire Translation Tasks at WAT 2019
Hideya Mino | Hitoshi Ito | Isao Goto | Ichiro Yamada | Hideki Tanaka | Takenobu Tokunaga
Proceedings of the 6th Workshop on Asian Translation

This paper describes NHK and NHK Engineering System (NHK-ES)’s submission to the newswire translation tasks of WAT 2019 in both directions of Japanese→English and English→Japanese. In addition to the JIJI Corpus that was officially provided by the task organizer, we developed a corpus of 0.22M sentence pairs by manually, translating Japanese news sentences into English content- equivalently. The content-equivalent corpus was effective for improving translation quality, and our systems achieved the best human evaluation scores in the newswire translation tasks at WAT 2019.

2017

pdf bib
Detecting Untranslated Content for Neural Machine Translation
Isao Goto | Hideki Tanaka
Proceedings of the First Workshop on Neural Machine Translation

Despite its promise, neural machine translation (NMT) has a serious problem in that source content may be mistakenly left untranslated. The ability to detect untranslated content is important for the practical use of NMT. We evaluate two types of probability with which to detect untranslated content: the cumulative attention (ATN) probability and back translation (BT) probability from the target sentence to the source sentence. Experiments on detecting untranslated content in Japanese-English patent translations show that ATN and BT are each more effective than random choice, BT is more effective than ATN, and the combination of the two provides further improvements. We also confirmed the effectiveness of using ATN and BT to rerank the n-best NMT outputs.

2015

pdf bib
The “News Web Easy” news service as a resource for teaching and learning Japanese: An assessment of the comprehension difficulty of Japanese sentence-end expressions
Hideki Tanaka | Tadashi Kumano | Isao Goto
Proceedings of the 2nd Workshop on Natural Language Processing Techniques for Educational Applications

pdf bib
Japanese news simplification: tak design, data set construction, and analysis of simplified text
Isao Goto | Hideki Tanaka | Tadashi Kumano
Proceedings of Machine Translation Summit XV: Papers

2012

pdf bib
Measuring the Similarity between TV Programs using Semantic Relations
Ichiro Yamada | Masaru Miyazaki | Hideki Sumiyoshi | Atsushi Matsui | Hironori Furumiya | Hideki Tanaka
Proceedings of COLING 2012

2009

pdf bib
Syntax-Driven Sentence Revision for Broadcast News Summarization
Hideki Tanaka | Akinori Kinoshita | Takeshi Kobayakawa | Tadashi Kumano | Naoto Katoh
Proceedings of the 2009 Workshop on Language Generation and Summarisation (UCNLG+Sum 2009)

2007

pdf bib
Extracting phrasal alignments from comparable corpora by using joint probability SMT model
Tadashi Kumano | Hideki Tanaka | Takenobu Tokunaga
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

2005

pdf bib
Analysis and Modeling of Manual Summarization of Japanese Broadcast News
Hideki Tanaka | Tadashi Kumano | Masamichi Nishiwaki | Takayuki Itoh
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

2004

pdf bib
Back Transliteration from Japanese to English using Target English Context
Isao Goto | Naoto Kato | Terumasa Ehara | Hideki Tanaka
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2003

pdf bib
Building a parallel corpus for monologues with clause alignment
Hideki Kashioka | Takehiko Maruyama | Hideki Tanaka
Proceedings of Machine Translation Summit IX: Papers

Many studies have been reported in the domain of speech-to-speech machine translation systems for travel conversation use. Therefore, a large number of travel domain corpora have become available in recent years. From a wider viewpoint, speech-to-speech systems are required for many purposes other than travel conversation. One of these is monologues (e.g., TV news, lectures, technical presentations). However, in monologues, sentences tend to be long and complicated, which often causes problems for parsing and translation. Therefore, we need a suitable translation unit, rather than the sentence. We propose the clause as a unit for translation. To develop a speech-to-speech machine translation system for monologues based on the clause as the translation unit, we need a monologue parallel corpus with clause alignment. In this paper, we describe how to build a Japanese-English monologue parallel corpus with clauses aligned, and discuss the features of this corpus.

pdf bib
A multi-language translation example browser
Isao Goto | Naoto Kato | Noriyoshi Uratani | Terumasa Ehara | Tadashi Kumano | Hideki Tanaka
Proceedings of Machine Translation Summit IX: System Presentations

This paper describes a Multi-language Translation Example Browser, a type of translation memory system. The system is able to retrieve translation examples from bilingual news databases, which consist of news transcripts of past broadcasts. We put a Japanese-English system to practical use and undertook trial operations of a system of eight language-pairs.

pdf bib
Word Selection for EBMT based on Monolingual Similarity and Translation Confidence
Eiji Aramaki | Sadao Kurohashi | Hideki Kashioka | Hideki Tanaka
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Comparing the Sentence Alignment Yield from Two News Corpora Using a Dictionary-Based Alignment System
Stephen Nightingale | Hideki Tanaka
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

pdf bib
Construction and Analysis of Japanese-English Broadcast News Corpus with Named Entity Tags
Tadashi Kumano | Hideki Kashioka | Hideki Tanaka | Takahiro Fukusima
Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition

2002

pdf bib
Automatic Alignment of Japanese and English Newspaper Articles using an MT System and a Bilingual Company Name Dictionary
Kenji Matsumoto | Hideki Tanaka
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf bib
ATR-SLT System for SENSEVAL-2 Japanese Translation Task
Tadashi Kumano | Hideki Kashioka | Hideki Tanaka
Proceedings of SENSEVAL-2 Second International Workshop on Evaluating Word Sense Disambiguation Systems

1999

pdf bib
An Efficient Statistical Speech Act Type Tagging System for Speech Translation Systems
Hideki Tanaka | Akio Yokoo
Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics

1998

pdf bib
Context Management with Topics for Spoken Dialogue Systems
Kristiina Jokinen | Hideki Tanaka | Akio Yokoo
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf bib
Context Management with Topics for Spoken Dialogue Systems
Kristiina Jokinen | Hideki Tanaka | Akio Yokoo
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf bib
Planning Dialogue Contributions With New Information
Kristiina Jokinen | Hideki Tanaka | Akio Yokoo
Natural Language Generation

1996

pdf bib
Decision Tree Learning Algorithm with Structured Attributes: Application to Verbal Case Frame Acquisition
Hideki Tanaka
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf bib
Verbal Case Frame Acquisition From a Bilingual Corpus: Gradual Knowledge Acquisition
Hideki Tanaka
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

1992

pdf bib
A Method of Translating English Delexical Structures Into Japanese
Hideki Tanaka | Teruaki Aizawa | Yeun-Bae Kim | Nobuko Hatada
COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics

1990

pdf bib
A Machine Translation System for Foreign News in Satellite Broadcasting
Teruaki Aizawa | Terumasa Ehara | Noriyoshi Uratani | Hideki Tanaka | Naoto Kato | Sumio Nakase | Norikazu Aruga | Takeo Matsuda
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics