Fit More and Train Faster With ZeRO via DeepSpeed and FairScale

By huggingface - 2021-01-20

We’re on a journey to solve and democratize artificial intelligence through natural language.

A guest blog post by Hugging Face fellow Stas Bekman Fit More and Train Faster With ZeRO via DeepSpeed and FairScale As recent Machine Learning models have been growing much faster than the amount of GPU memory added to newly released cards, many users are unable to train or even just load some of those huge models onto their hardware.
Following the 80:20 rule, I have only spent a few hours on these benchmarks and I haven't tried to squeeze every MB and second by refining the command line arguments and configuration, since it's pretty obvious from the simple table what you'd want to try next.
The Magic Behind ZeRO Since transformers only integrated these fabulous solutions and wasn't part of their invention I will share the resources where you can discover all the details for yourself.
You can, of course, modify your own trainer to integrate DeepSpeed and FairScale, based on each project's instructions or you can "cheat" and see how we did it in the transformers Trainer.

Similar Articles