We explore Interactive Post-Editing (IPE) models for human-in-loop translation to help correct translation errors and rephrase it with a desired style variation. We specifically study verbosity for style variations and build on top of multi-source transformers that can read source and hypothesis to improve the latter with user inputs. Token-level interaction inputs for error corrections and length interaction inputs for verbosity control are used by the model to generate a suitable translation. We report BERTScore to evaluate semantic quality with other relevant metrics for translations from English to German, French and Spanish languages. Our model achieves superior BERTScore over state-of-the-art machine translation models while maintaining the desired token-level and verbosity preference.
Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.
Automatic post-editing (APE) models are usedto correct machine translation (MT) system outputs by learning from human post-editing patterns. We present the system used in our submission to the WMT’21 Automatic Post-Editing (APE) English-German (En-De) shared task. We leverage the state-of-the-art MT system (Ng et al., 2019) for this task. For further improvements, we adapt the MT model to the task domain by using WikiMatrix (Schwenket al., 2021) followed by fine-tuning with additional APE samples from previous editions of the shared task (WMT-16,17,18) and ensembling the models. Our systems beat the baseline on TER scores on the WMT’21 test set.
Neural Conversational QA tasks such as ShARC require systems to answer questions based on the contents of a given passage. On studying recent state-of-the-art models on the ShARC QA task, we found indications that the model(s) learn spurious clues/patterns in the data-set. Further, a heuristic-based program, built to exploit these patterns, had comparative performance to that of the neural models. In this paper we share our findings about the four types of patterns in the ShARC corpus and how the neural models exploit them. Motivated by the above findings, we create and share a modified data-set that has fewer spurious patterns than the original data-set, consequently allowing models to learn better.