Accelerating Neural Networks on Mobile and Web with Sparse Inference

By Google AI Blog - 2021-03-09

Description

Posted by Artsiom Ablavatski and Marat Dukhan, Software Engineers, Google Research On-device inference of neural networks enables a var...

Summary

  • On-device inference of neural networks enables a variety of real-time applications, like pose estimation and background blur, in a low-latency and privacy-conscious way.
  • In modern on-device inference engines, like XNNPACK, the implementation of 1x1 convolutions as well as other operations in the deep learning models rely on the HWC tensor layout, in which the tensor dimensions correspond to the height, width, and channel (e.g., red, green or blue) of the input image.
  • Compared with the dense model the sparse model improved the inference by a factor of two, achieving the identical landmark quality as the distilled model.
  • Processing time of the dense model is 2x larger than sparse or distilled models.

 

Topics

  1. Machine_Learning (0.23)
  2. NLP (0.14)
  3. Backend (0.05)

Similar Articles

FastFormers: 233x Faster Transformers inference on CPU

By Medium - 2020-11-04

Since the birth of BERT followed by that of Transformers have dominated NLP in nearly every language-related tasks whether it is Question-Answering, Sentiment Analysis, Text classification or Text…

The Model’s Shipped; What Could Possibly go Wrong

By Medium - 2021-02-18

In our last post we took a broad look at model observability and the role it serves in the machine learning workflow. In particular, we discussed the promise of model observability & model monitoring…

Facebook’s Prophet + Deep Learning = NeuralProphet

By Medium - 2020-12-10

While learning about time series forecasting, sooner or later you will encounter the vastly popular Prophet model, developed by Facebook. It gained lots of popularity due to the fact that it provides…