Tag Assisted Neural Machine Translation of Film Subtitles
Aren Siekmeier | WonKee Lee | Hongseok Kwon | Jong-Hyeok Lee
Proceedings of the 18th International Conference on Spoken Language Translation (IWSLT 2021)
We implemented a neural machine translation system that uses automatic sequence tagging to improve the quality of translation. Instead of operating on unannotated sentence pairs, our system uses pre-trained tagging systems to add linguistic features to source and target sentences. Our proposed neural architecture learns a combined embedding of tokens and tags in the encoder, and simultaneous token and tag prediction in the decoder. Compared to a baseline with unannotated training, this architecture increased the BLEU score of German to English film subtitle translation outputs by 1.61 points using named entity tags; however, the BLEU score decreased by 0.38 points using part-of-speech tags. This demonstrates that certain token-level tag outputs from off-the-shelf tagging systems can improve the output of neural translation systems using our combined embedding and simultaneous decoding extensions.
Transformer-based Screenplay Summarization Using Augmented Learning Representation with Dialogue Information
Myungji Lee | Hongseok Kwon | Jaehun Shin | WonKee Lee | Baikjin Jung | Jong-Hyeok Lee
Proceedings of the Third Workshop on Narrative Understanding
Screenplay summarization is the task of extracting informative scenes from a screenplay. The screenplay contains turning point (TP) events that change the story direction and thus define the story structure decisively. Accordingly, this task can be defined as the TP identification task. We suggest using dialogue information, one attribute of screenplays, motivated by previous work that discovered that TPs have a relation with dialogues appearing in screenplays. To teach a model this characteristic, we add a dialogue feature to the input embedding. Moreover, in an attempt to improve the model architecture of previous studies, we replace LSTM with Transformer. We observed that the model can better identify TPs in a screenplay by using dialogue information and that a model adopting Transformer outperforms LSTM-based models.
POSTECH Submission on Duolingo Shared Task
Junsu Park | Hongseok Kwon | Jong-Hyeok Lee
Proceedings of the Fourth Workshop on Neural Generation and Translation
In this paper, we propose a transfer learning based simultaneous translation model by extending BART. We pre-trained BART with Korean Wikipedia and a Korean news dataset, and fine-tuned with an additional web-crawled parallel corpus and the 2020 Duolingo official training dataset. In our experiments on the 2020 Duolingo test dataset, our submission achieves 0.312 in weighted macro F1 score, and ranks second among the submitted En-Ko systems.
- Jong-Hyeok Lee 3
- WonKee Lee 2
- Aren Siekmeier 1
- Myungji Lee 1
- Jaehun Shin 1
- show all...