Yikai Zhou
2020
Uncertainty-Aware Curriculum Learning for Neural Machine Translation
Yikai Zhou
|
Baosong Yang
|
Derek F. Wong
|
Yu Wan
|
Lidia S. Chao
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Neural machine translation (NMT) has proven to be facilitated by curriculum learning which presents examples in an easy-to-hard order at different training stages. The keys lie in the assessment of data difficulty and model competence. We propose uncertainty-aware curriculum learning, which is motivated by the intuition that: 1) the higher the uncertainty in a translation pair, the more complex and rarer the information it contains; and 2) the end of the decline in model uncertainty indicates the completeness of current training stage. Specifically, we serve cross-entropy of an example as its data difficulty and exploit the variance of distributions over the weights of the network to present the model uncertainty. Extensive experiments on various translation tasks reveal that our approach outperforms the strong baseline and related methods on both translation quality and convergence speed. Quantitative analyses reveal that the proposed strategy offers NMT the ability to automatically govern its learning schedule.
Self-Paced Learning for Neural Machine Translation
Yu Wan
|
Baosong Yang
|
Derek F. Wong
|
Yikai Zhou
|
Lidia S. Chao
|
Haibo Zhang
|
Boxing Chen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Recent studies have proven that the training of neural machine translation (NMT) can be facilitated by mimicking the learning process of humans. Nevertheless, achievements of such kind of curriculum learning rely on the quality of artificial schedule drawn up with the handcrafted features, e.g. sentence length or word rarity. We ameliorate this procedure with a more flexible manner by proposing self-paced learning, where NMT model is allowed to 1) automatically quantify the learning confidence over training examples; and 2) flexibly govern its learning via regulating the loss in each iteration step. Experimental results over multiple translation tasks demonstrate that the proposed model yields better performance than strong baselines and those models trained with human-designed curricula on both translation quality and convergence speed.
Search
Fix data
Co-authors
- Lidia S. Chao 2
- Yu Wan 2
- Derek F. Wong (黄辉) 2
- Baosong Yang 2
- Boxing Chen 1
- show all...