3 deep learning mysteries: Ensemble, knowledge- and self-distillation

By Microsoft Research - 2021-01-19

Description

Microsoft and CMU researchers begin to unravel 3 mysteries in deep learning related to ensemble, knowledge distillation & self-distillation. Discover how their work leads to the first theoretical proo ...

Summary

  • Under now-standard techniques, such as over-parameterization, batch-normalization, and adding residual links, “modern age” neural network training—at least for image classification tasks and many others—is usually quite stable.
  • Does ensemble/knowledge distillation work in the same way in deep learning compared to that in random feature mappings (namely, the NTK feature mappings)?
  • In other words, during knowledge distillation, the individual model is forced to learn every possible view feature, matching the performance of ensemble.
  • At a high level, we view self-distillation as combining ensemble and knowledge distillation in a more compact manner.

 

Topics

  1. Machine_Learning (0.6)
  2. Backend (0.2)
  3. NLP (0.13)

Similar Articles

30 Most Asked Machine Learning Questions Answered

By Medium - 2021-03-18

Machine Learning is the path to a better and advanced future. A Machine Learning Developer is the most demanding job in 2021 and it is going to increase by 20–30% in the upcoming 3–5 years. Machine…