How to compress a neural network. An introduction to weight pruning

By Medium - 2020-10-08

Description

Modern state-of-the-art neural network architectures are HUGE. For instance, you have probably heard about GPT-3, OpenAI’s newest revolutionary NLP model, capable of writing poetry and interactive…

Summary

To give you a perspective about how large this number is, consider the following.
However, this is just the tip of the iceberg.
During their experiments with pruning the LeNet for MNIST classification, they found that a significant portion of the weights can be removed without a noticeable increase in the loss.
Here, the student model not only sees the training data for the big one, but new data as well, where it is fitted to approximate the output of the teacher.

Topics

Machine_Learning (0.39)
NLP (0.13)
Backend (0.08)

Similar Articles

FastFormers: 233x Faster Transformers inference on CPU

By Medium - 2020-11-04

Since the birth of BERT followed by that of Transformers have dominated NLP in nearly every language-related tasks whether it is Question-Answering, Sentiment Analysis, Text classification or Text…

Facebook’s Prophet + Deep Learning = NeuralProphet

By Medium - 2020-12-10

While learning about time series forecasting, sooner or later you will encounter the vastly popular Prophet model, developed by Facebook. It gained lots of popularity due to the fact that it provides…

BERT, RoBERTa, DistilBERT, XLNet: Which one to use

By KDnuggets - 2021-01-20

Lately, varying improvements over BERT have been shown — and here I will contrast the main similarities and differences so you can choose which one to use in your research or application.

Interpretability in Machine Learning: An Overview

By The Gradient - 2020-11-21

A broad overview of the sub-field of machine learning interpretability; conceptual frameworks, existing research, and future directions.

3 deep learning mysteries: Ensemble, knowledge- and self-distillation

By Microsoft Research - 2021-01-19

Microsoft and CMU researchers begin to unravel 3 mysteries in deep learning related to ensemble, knowledge distillation & self-distillation. Discover how their work leads to the first theoretical proo ...

LSTM for time series prediction

By KDnuggets - 2021-01-23

Learn how to develop a LSTM neural network with PyTorch on trading data to predict future prices by mimicking actual values of the time series data.