Jingjing Huo


2023

pdf bib
AntContentTech at SemEval-2023 Task 6: Domain-adaptive Pretraining and Auxiliary-task Learning for Understanding Indian Legal Texts
Jingjing Huo | Kezun Zhang | Zhengyong Liu | Xuan Lin | Wenqiang Xu | Maozong Zheng | Zhaoguo Wang | Song Li
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

The objective of this shared task is to gain an understanding of legal texts, and it is beset with difficulties such as the comprehension of lengthy noisy legal documents, domain specificity as well as the scarcity of annotated data. To address these challenges, we propose a system that employs a hierarchical model and integrates domain-adaptive pretraining, data augmentation, and auxiliary-task learning techniques. Moreover, to enhance generalization and robustness, we ensemble the models that utilize these diverse techniques. Our system ranked first on the RR sub-task and in the middle for the other two sub-tasks.

2020

pdf bib
Diving Deep into Context-Aware Neural Machine Translation
Jingjing Huo | Christian Herold | Yingbo Gao | Leonard Dahlmann | Shahram Khadivi | Hermann Ney
Proceedings of the Fifth Conference on Machine Translation

Context-aware neural machine translation (NMT) is a promising direction to improve the translation quality by making use of the additional context, e.g., document-level translation, or having meta-information. Although there exist various architectures and analyses, the effectiveness of different context-aware NMT models is not well explored yet. This paper analyzes the performance of document-level NMT models on four diverse domains with a varied amount of parallel document-level bilingual data. We conduct a comprehensive set of experiments to investigate the impact of document-level NMT. We find that there is no single best approach to document-level NMT, but rather that different architectures come out on top on different tasks. Looking at task-specific problems, such as pronoun resolution or headline translation, we find improvements in the context-aware systems, even in cases where the corpus-level metrics like BLEU show no significant improvement. We also show that document-level back-translation significantly helps to compensate for the lack of document-level bi-texts.