How to compress a neural network. An introduction to weight pruning

By Medium - 2020-10-08

Description

Modern state-of-the-art neural network architectures are HUGE. For instance, you have probably heard about GPT-3, OpenAI’s newest revolutionary NLP model, capable of writing poetry and interactive…

Summary

  • To give you a perspective about how large this number is, consider the following.
  • However, this is just the tip of the iceberg.
  • During their experiments with pruning the LeNet for MNIST classification, they found that a significant portion of the weights can be removed without a noticeable increase in the loss.
  • Here, the student model not only sees the training data for the big one, but new data as well, where it is fitted to approximate the output of the teacher.

 

Topics

  1. Machine_Learning (0.39)
  2. NLP (0.13)
  3. Backend (0.08)

Similar Articles

FastFormers: 233x Faster Transformers inference on CPU

By Medium - 2020-11-04

Since the birth of BERT followed by that of Transformers have dominated NLP in nearly every language-related tasks whether it is Question-Answering, Sentiment Analysis, Text classification or Text…

Facebook’s Prophet + Deep Learning = NeuralProphet

By Medium - 2020-12-10

While learning about time series forecasting, sooner or later you will encounter the vastly popular Prophet model, developed by Facebook. It gained lots of popularity due to the fact that it provides…

BERT, RoBERTa, DistilBERT, XLNet: Which one to use

By KDnuggets - 2021-01-20

Lately, varying improvements over BERT have been shown — and here I will contrast the main similarities and differences so you can choose which one to use in your research or application.