HyoJung Han


2023

pdf bib
Bridging Background Knowledge Gaps in Translation with Automatic Explicitation
HyoJung Han | Jordan Boyd-Graber | Marine Carpuat
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Translations help people understand content written in another language. However, even correct literal translations do not fulfill that goal when people lack the necessary background to understand them. Professional translators incorporate explicitations to explain the missing context by considering cultural differences between source and target audiences. Despite its potential to help users, NLP research on explicitation is limited because of the dearth of adequate evaluation methods. This work introduces techniques for automatically generating explicitations, motivated by WikiExpl: a dataset that we collect from Wikipedia and annotate with human translators. The resulting explicitations are useful as they help answer questions more accurately in a multilingual question answering framework.

2022

pdf bib
SimQA: Detecting Simultaneous MT Errors through Word-by-Word Question Answering
HyoJung Han | Marine Carpuat | Jordan Boyd-Graber
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Detractors of neural machine translation admit that while its translations are fluent, it sometimes gets key facts wrong. This is particularly important in simultaneous interpretation where translations have to be provided as fast as possible: before a sentence is complete. Yet, evaluations of simultaneous machine translation (SimulMT) fail to capture if systems correctly translate the most salient elements of a question: people, places, and dates. To address this problem, we introduce a downstream word-by-word question answering evaluation task (SimQA): given a source language question, translate the question word by word into the target language, and answer as soon as possible. SimQA jointly measures whether the SimulMT models translate the question quickly and accurately, and can reveal shortcomings in existing neural systems—hallucinating or omitting facts.

2021

pdf bib
Monotonic Simultaneous Translation with Chunk-wise Reordering and Refinement
HyoJung Han | Seokchan Ahn | Yoonjung Choi | Insoo Chung | Sangha Kim | Kyunghyun Cho
Proceedings of the Sixth Conference on Machine Translation

Recent work in simultaneous machine translation is often trained with conventional full sentence translation corpora, leading to either excessive latency or necessity to anticipate as-yet-unarrived words, when dealing with a language pair whose word orders significantly differ. This is unlike human simultaneous interpreters who produce largely monotonic translations at the expense of the grammaticality of a sentence being translated. In this paper, we thus propose an algorithm to reorder and refine the target side of a full sentence translation corpus, so that the words/phrases between the source and target sentences are aligned largely monotonically, using word alignment and non-autoregressive neural machine translation. We then train a widely used wait-k simultaneous translation model on this reordered-and-refined corpus. The proposed approach improves BLEU scores and resulting translations exhibit enhanced monotonicity with source sentences.