3 deep learning mysteries: Ensemble, knowledge- and self-distillation

By Microsoft Research - 2021-01-19

Description

Microsoft and CMU researchers begin to unravel 3 mysteries in deep learning related to ensemble, knowledge distillation & self-distillation. Discover how their work leads to the first theoretical proo ...

Summary

Under now-standard techniques, such as over-parameterization, batch-normalization, and adding residual links, “modern age” neural network training—at least for image classification tasks and many others—is usually quite stable.
Does ensemble/knowledge distillation work in the same way in deep learning compared to that in random feature mappings (namely, the NTK feature mappings)?
In other words, during knowledge distillation, the individual model is forced to learn every possible view feature, matching the performance of ensemble.
At a high level, we view self-distillation as combining ensemble and knowledge distillation in a more compact manner.

Topics

Machine_Learning (0.6)
Backend (0.2)
NLP (0.13)

Similar Articles

Something Every Data Scientist Should Know But Probably Doesn’t: The Bias-Variance Trade-off…

By Medium - 2021-01-04

A groundbreaking and relatively new discovery upends classical statistics with relevant implications for data science practitioners and…

How I Consistently Improve My Machine Learning Models From 80% to Over 90% Accuracy

By KDnuggets - 2020-11-20

Data science work typically requires a big lift near the end to increase the accuracy of any model developed. These five recommendations will help improve your machine learning models and help your pr ...

Clothes Classification with the DeepFashion Dataset and Fastai

By Medium - 2021-02-02

How to outperform the benchmark in clothes recognition with fastai and DeepFashion Dataset. How to use fastai models in PyTorch. Code, explanation, evaluation on the user data.

30 Most Asked Machine Learning Questions Answered

By Medium - 2021-03-18

Machine Learning is the path to a better and advanced future. A Machine Learning Developer is the most demanding job in 2021 and it is going to increase by 20–30% in the upcoming 3–5 years. Machine…

Interpretability in Machine Learning: An Overview

By The Gradient - 2020-11-21

A broad overview of the sub-field of machine learning interpretability; conceptual frameworks, existing research, and future directions.

How to Accelerate Learning of Deep Neural Networks With Batch Normalization

By Machine Learning Mastery - 2019-01-17

Batch normalization is a technique designed to automatically standardize the inputs to a layer in a deep learning neural network. Once implemented, batch normalization has the effect of dramatically a ...