Description
Who’s Adam? Why should we care about “his” friends?!
Summary
- Each of Adam’s friends has contributed to Adam’s personality.
- Gradient at point A is the slope of the parabolic function, and by calculating the gradients, we can find the steepest direction in which to move to minimise the value of the function.
- Thus, essentially, with Momentum, if the momentum factor as in eq-3 is β, then compared to SGD, instead of the new step just being guided by the gradients, is also guided by β times the old step size.
- Next, for each parameter we store a state referred to as param_state.