%0 Conference Proceedings
%T Train, Sort, Explain: Learning to Diagnose Translation Models
%A Schwarzenberg, Robert
%A Harbecke, David
%A Macketanz, Vivien
%A Avramidis, Eleftherios
%A Möller, Sebastian
%Y Ammar, Waleed
%Y Louis, Annie
%Y Mostafazadeh, Nasrin
%S Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations)
%D 2019
%8 June
%I Association for Computational Linguistics
%C Minneapolis, Minnesota
%F schwarzenberg-etal-2019-train
%X Evaluating translation models is a trade-off between effort and detail. On the one end of the spectrum there are automatic count-based methods such as BLEU, on the other end linguistic evaluations by humans, which arguably are more informative but also require a disproportionately high effort. To narrow the spectrum, we propose a general approach on how to automatically expose systematic differences between human and machine translations to human experts. Inspired by adversarial settings, we train a neural text classifier to distinguish human from machine translations. A classifier that performs and generalizes well after training should recognize systematic differences between the two classes, which we uncover with neural explainability methods. Our proof-of-concept implementation, DiaMaT, is open source. Applied to a dataset translated by a state-of-the-art neural Transformer model, DiaMaT achieves a classification accuracy of 75% and exposes meaningful differences between humans and the Transformer, amidst the current discussion about human parity.
%R 10.18653/v1/N19-4006
%U https://aclanthology.org/N19-4006
%U https://doi.org/10.18653/v1/N19-4006
%P 29-34