Aarhus University Seal

Kenneth Borup

Even your Teacher Needs Guidance: Ground-Truth Targets Dampen Regularization Imposed by Self-Distillation

Research output: Contribution to book/anthology/report/proceedingArticle in proceedingsResearchpeer-review

Knowledge distillation is classically a procedure where a neural network is trained on the output of another network along with the original targets in order to transfer knowledge between the architectures. The special case of self-distillation, where the network architectures are identical, has been observed to improve generalization accuracy. In this paper, we consider an iterative variant of self-distillation in a kernel regression setting, in which successive steps incorporate both model outputs and the ground-truth targets. This allows us to provide the first theoretical results on the importance of using the weighted ground-truth targets in self-distillation. Our focus is on fitting nonlinear functions to training data with a weighted mean square error objective function suitable for distillation, subject to ℓ2 regularization of the model parameters. We show that any such function obtained with self-distillation can be calculated directly as a function of the initial fit, and that infinite distillation steps yields the same optimization problem as the original with amplified regularization. Furthermore, we provide a closed form solution for the optimal choice of weighting parameter at each step, and show how to efficiently estimate this weighting parameter for deep learning and significantly reduce the computational requirements compared to a grid search.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
EditorsMarc'Aurelio Ranzato, Alina Beygelzimer, Yann Dauphin, Percy S. Liang, Jenn Wortman Vaughan
Number of pages12
PublisherNeural Information Processing Systems Foundation
Publication year2021
Pages5316-5327
ISBN (Electronic)9781713845393
Publication statusPublished - 2021
Event35th Conference on Neural Information Processing Systems, NeurIPS 2021 - Virtual, Online
Duration: 6 Dec 202114 Dec 2021

Conference

Conference35th Conference on Neural Information Processing Systems, NeurIPS 2021
ByVirtual, Online
Periode06/12/202114/12/2021
SeriesAdvances in Neural Information Processing Systems
Volume7
ISSN1049-5258

Bibliographical note

Publisher Copyright:
© 2021 Neural information processing systems foundation. All rights reserved.

See relations at Aarhus University Citationformats

ID: 284393732