Dawei Zhu


pdf bib
Is BERT Robust to Label Noise? A Study on Learning with Noisy Labels in Text Classification
Dawei Zhu | Michael Hedderich | Fangzhou Zhai | David Adelani | Dietrich Klakow
Proceedings of the Third Workshop on Insights from Negative Results in NLP

Incorrect labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision. It has been shown that complex noise-handling techniques - by modeling, cleaning or filtering the noisy instances - are required to prevent models from fitting this label noise. However, we show in this work that, for text classification tasks with modern NLP models like BERT, over a variety of noise types, existing noise-handling methods do not always improve its performance, and may even deteriorate it, suggesting the need for further investigation. We also back our observations with a comprehensive analysis.


pdf bib
Neural Data-to-Text Generation with LM-based Text Augmentation
Ernie Chang | Xiaoyu Shen | Dawei Zhu | Vera Demberg | Hui Su
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

For many new application domains for data-to-text generation, the main obstacle in training neural models consists of a lack of training data. While usually large numbers of instances are available on the data side, often only very few text samples are available. To address this problem, we here propose a novel few-shot approach for this setting. Our approach automatically augments the data available for training by (i) generating new text samples based on replacing specific values by alternative ones from the same category, (ii) generating new text samples based on GPT-2, and (iii) proposing an automatic method for pairing the new text samples with data samples. As the text augmentation can introduce noise to the training data, we use cycle consistency as an objective, in order to make sure that a given data sample can be correctly reconstructed after having been formulated as text (and that text samples can be reconstructed from data). On both the E2E and WebNLG benchmarks, we show that this weakly supervised training paradigm is able to outperform fully supervised sequence-to-sequence models with less than 10% of the training set. By utilizing all annotated data, our model can boost the performance of a standard sequence-to-sequence model by over 5 BLEU points, establishing a new state-of-the-art on both datasets.


pdf bib
Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages
Michael A. Hedderich | David Adelani | Dawei Zhu | Jesujoba Alabi | Udia Markus | Dietrich Klakow
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and on both NER and topic classification. We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines with much more supervised training data. However, we also find settings where this does not hold. Our discussions and additional experiments on assumptions such as time and hardware restrictions highlight challenges and opportunities in low-resource learning.