Tetsuji Nakagawa


2023

pdf bib
LEALLA: Learning Lightweight Language-agnostic Sentence Embeddings with Knowledge Distillation
Zhuoyuan Mao | Tetsuji Nakagawa
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Large-scale language-agnostic sentence embedding models such as LaBSE (Feng et al., 2022) obtain state-of-the-art performance for parallel sentence alignment. However, these large-scale models can suffer from inference speed and computation overhead. This study systematically explores learning language-agnostic sentence embeddings with lightweight models. We demonstrate that a thin-deep encoder can construct robust low-dimensional sentence embeddings for 109 languages. With our proposed distillation methods, we achieve further improvements by incorporating knowledge from a teacher model. Empirical results on Tatoeba, United Nations, and BUCC show the effectiveness of our lightweight models. We release our lightweight language-agnostic sentence embedding models LEALLA on TensorFlow Hub.

2018

pdf bib
Denoising Neural Machine Translation Training with Trusted Data and Online Data Selection
Wei Wang | Taro Watanabe | Macduff Hughes | Tetsuji Nakagawa | Ciprian Chelba
Proceedings of the Third Conference on Machine Translation: Research Papers

Measuring domain relevance of data and identifying or selecting well-fit domain data for machine translation (MT) is a well-studied topic, but denoising is not yet. Denoising is concerned with a different type of data quality and tries to reduce the negative impact of data noise on MT training, in particular, neural MT (NMT) training. This paper generalizes methods for measuring and selecting data for domain MT and applies them to denoising NMT training. The proposed approach uses trusted data and a denoising curriculum realized by online data selection. Intrinsic and extrinsic evaluations of the approach show its significant effectiveness for NMT to train on data with severe noise.

2017

pdf bib
An Empirical Study of Language Relatedness for Transfer Learning in Neural Machine Translation
Raj Dabre | Tetsuji Nakagawa | Hideto Kazawa
Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation

2016

pdf bib
Phrase-based Machine Translation using Multiple Preordering Candidates
Yusuke Oda | Taku Kudo | Tetsuji Nakagawa | Taro Watanabe
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper, we propose a new decoding method for phrase-based statistical machine translation which directly uses multiple preordering candidates as a graph structure. Compared with previous phrase-based decoding methods, our method is based on a simple left-to-right dynamic programming in which no decoding-time reordering is performed. As a result, its runtime is very fast and implementing the algorithm becomes easy. Our system does not depend on specific preordering methods as long as they output multiple preordering candidates, and it is trivial to employ existing preordering methods into our system. In our experiments for translating diverse 11 languages into English, the proposed method outperforms conventional phrase-based decoder in terms of translation qualities under comparable or faster decoding time.

2015

pdf bib
Efficient Top-Down BTG Parsing for Machine Translation Preordering
Tetsuji Nakagawa
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2010

pdf bib
Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables
Tetsuji Nakagawa | Kentaro Inui | Sadao Kurohashi
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf bib
WISDOM: A Web Information Credibility Analysis Systematic
Susumu Akamine | Daisuke Kawahara | Yoshikiyo Kato | Tetsuji Nakagawa | Kentaro Inui | Sadao Kurohashi | Yutaka Kidawara
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

2007

pdf bib
A Hybrid Approach to Word Segmentation and POS Tagging
Tetsuji Nakagawa | Kiyotaka Uchimoto
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
Multilingual Dependency Parsing Using Global Features
Tetsuji Nakagawa
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
Guessing Parts-of-Speech of Unknown Words Using Global Information
Tetsuji Nakagawa | Yuji Matsumoto
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2004

pdf bib
Chinese and Japanese Word Segmentation Using Word-Level and Character-Level Information
Tetsuji Nakagawa
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

2002

pdf bib
Detecting Errors in Corpora Using Support Vector Machines
Tetsuji Nakagawa | Yuji Matsumoto
COLING 2002: The 19th International Conference on Computational Linguistics

pdf bib
Revision Learning and its Application to Part-of-Speech Tagging
Tetsuji Nakagawa | Taku Kudo | Yuji Matsumoto
Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics