2022
pdf
bib
Japanese Named Entity Recognition from Automatic Speech Recognition Using Pre-trained Models
Seiichiro Kondo
|
Naoya Ueda
|
Teruaki Oka
|
Masakazu Sugiyama
|
Asahi Hentona
|
Mamoru Komachi
Proceedings of the 36th Pacific Asia Conference on Language, Information and Computation
pdf
bib
abs
TMU NMT System with Automatic Post-Editing by Multi-Source Levenshtein Transformer for the Restricted Translation Task of WAT 2022
Seiichiro Kondo
|
Mamoru Komachi
Proceedings of the 9th Workshop on Asian Translation
In this paper, we describe our TMU English–Japanese systems submitted to the restricted translation task at WAT 2022 (Nakazawa et al., 2022). In this task, we translate an input sentence with the constraint that certain words or phrases (called restricted target vocabularies (RTVs)) should be contained in the output sentence. To satisfy this constraint, we address this task using a combination of two techniques. One is lexical-constraint-aware neural machine translation (LeCA) (Chen et al., 2020), which is a method of adding RTVs at the end of input sentences. The other is multi-source Levenshtein transformer (MSLevT) (Wan et al., 2020), which is a non-autoregressive method for automatic post-editing. Our system generates translations in two steps. First, we generate the translation using LeCA. Subsequently, we filter the sentences that do not satisfy the constraints and post-edit them with MSLevT. Our experimental results reveal that 100% of the RTVs can be included in the generated sentences while maintaining the translation quality of the LeCA model on both English to Japanese (En→Ja) and Japanese to English (Ja→En) tasks. Furthermore, the method used in previous studies requires an increase in the beam size to satisfy the constraints, which is computationally expensive. In contrast, the proposed method does not require a similar increase and can generate translations faster.
2021
pdf
bib
abs
Sentence Concatenation Approach to Data Augmentation for Neural Machine Translation
Seiichiro Kondo
|
Kengo Hotate
|
Tosho Hirasawa
|
Masahiro Kaneko
|
Mamoru Komachi
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
Recently, neural machine translation is widely used for its high translation accuracy, but it is also known to show poor performance at long sentence translation. Besides, this tendency appears prominently for low resource languages. We assume that these problems are caused by long sentences being few in the train data. Therefore, we propose a data augmentation method for handling long sentences. Our method is simple; we only use given parallel corpora as train data and generate long sentences by concatenating two sentences. Based on our experiments, we confirm improvements in long sentence translation by proposed data augmentation despite the simplicity. Moreover, the proposed method improves translation quality more when combined with back-translation.
pdf
bib
abs
Machine Translation with Pre-specified Target-side Words Using a Semi-autoregressive Model
Seiichiro Kondo
|
Aomi Koyama
|
Tomoshige Kiyuna
|
Tosho Hirasawa
|
Mamoru Komachi
Proceedings of the 8th Workshop on Asian Translation (WAT2021)
We introduce our TMU Japanese-to-English system, which employs a semi-autoregressive model, to tackle the WAT 2021 restricted translation task. In this task, we translate an input sentence with the constraint that some words, called restricted target vocabularies (RTVs), must be contained in the output sentence. To satisfy this constraint, we use a semi-autoregressive model, namely, RecoverSAT, due to its ability (known as “forced translation”) to insert specified words into the output sentence. When using “forced translation,” the order of inserting RTVs is a critical problem. In this work, we aligned the source sentence and the corresponding RTVs using GIZA++. In our system, we obtain word alignment between a source sentence and the corresponding RTVs and then sort the RTVs in the order of their corresponding words or phrases in the source sentence. Using the model with sorted order RTVs, we succeeded in inserting all the RTVs into output sentences in more than 96% of the test sentences. Moreover, we confirmed that sorting RTVs improved the BLEU score compared with random order RTVs.