Eyup Yilmaz
2022
D2U: Distance-to-Uniform Learning for Out-of-Scope Detection
Eyup Yilmaz
|
Cagri Toraman
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Supervised training with cross-entropy loss implicitly forces models to produce probability distributions that follow a discrete delta distribution. Model predictions in test time are expected to be similar to delta distributions if the classifier determines the class of an input correctly. However, the shape of the predicted probability distribution can become similar to the uniform distribution when the model cannot infer properly. We exploit this observation for detecting out-of-scope (OOS) utterances in conversational systems. Specifically, we propose a zero-shot post-processing step, called Distance-to-Uniform (D2U), exploiting not only the classification confidence score, but the shape of the entire output distribution. We later combine it with a learning procedure that uses D2U for loss calculation in the supervised setup. We conduct experiments using six publicly available datasets. Experimental results show that the performance of OOS detection is improved with our post-processing when there is no OOS training data, as well as with D2U learning procedure when OOS training data is available.
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Cagri Toraman
|
Furkan Şahinuç
|
Eyup Yilmaz
Proceedings of the Thirteenth Language Resources and Evaluation Conference
The performance of hate speech detection models relies on the datasets on which the models are trained. Existing datasets are mostly prepared with a limited number of instances or hate domains that define hate topics. This hinders large-scale analysis and transfer learning with respect to hate domains. In this study, we construct large-scale tweet datasets for hate speech detection in English and a low-resource language, Turkish, consisting of human-labeled 100k tweets per each. Our datasets are designed to have equal number of tweets distributed over five domains. The experimental results supported by statistical tests show that Transformer-based language models outperform conventional bag-of-words and neural models by at least 5% in English and 10% in Turkish for large-scale hate speech detection. The performance is also scalable to different training sizes, such that 98% of performance in English, and 97% in Turkish, are recovered when 20% of training instances are used. We further examine the generalization ability of cross-domain transfer among hate domains. We show that 96% of the performance of a target domain in average is recovered by other domains for English, and 92% for Turkish. Gender and religion are more successful to generalize to other domains, while sports fail most.
Search