Andrea Caciolai


2023

pdf bib
Mitigating the Burden of Redundant Datasets via Batch-Wise Unique Samples and Frequency-Aware Losses
Donato Crisostomi | Andrea Caciolai | Alessandro Pedrani | Kay Rottmann | Alessandro Manzotti | Enrico Palumbo | Davide Bernardi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

Datasets used to train deep learning models in industrial settings often exhibit skewed distributions with some samples repeated a large number of times. This paper presents a simple yet effective solution to reduce the increased burden of repeated computation on redundant datasets. Our approach eliminates duplicates at the batch level, without altering the data distribution observed by the model, making it model-agnostic and easy to implement as a plug-and-play module. We also provide a mathematical expression to estimate the reduction in training time that our approach provides. Through empirical evidence, we show that our approach significantly reduces training times on various models across datasets with varying redundancy factors, without impacting their performance on the Named Entity Recognition task, both on publicly available datasets and in real industrial settings. In the latter, the approach speeds training by up to 87%, and by 46% on average, with a drop in model performance of 0.2% relative at worst. We finally release a modular and reusable codebase to further advance research in this area.

pdf bib
Regression-Free Model Updates for Spoken Language Understanding
Andrea Caciolai | Verena Weber | Tobias Falke | Alessandro Pedrani | Davide Bernardi
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)

In real-world systems, an important requirement for model updates is to avoid regressions in user experience caused by flips of previously correct classifications to incorrect ones. Multiple techniques for that have been proposed in the recent literature. In this paper, we apply one such technique, focal distillation, to model updates in a goal-oriented dialog system and assess its usefulness in practice. In particular, we evaluate its effectiveness for key language understanding tasks, including sentence classification and sequence labeling tasks, we further assess its effect when applied to repeated model updates over time, and test its compatibility with mislabeled data. Our experiments on a public benchmark and data from a deployed dialog system demonstrate that focal distillation can substantially reduce regressions, at only minor drops in accuracy, and that it further outperforms naive supervised training in challenging mislabeled data and label expansion settings.