Description
Modern state-of-the-art neural network architectures are HUGE. For instance, you have probably heard about GPT-3, OpenAI’s newest revolutionary NLP model, capable of writing poetry and interactive…
Summary
- To give you a perspective about how large this number is, consider the following.
- However, this is just the tip of the iceberg.
- During their experiments with pruning the LeNet for MNIST classification, they found that a significant portion of the weights can be removed without a noticeable increase in the loss.
- Here, the student model not only sees the training data for the big one, but new data as well, where it is fitted to approximate the output of the teacher.