Training a neural network involves some form of optimisation (minimisation) of an objective function. The root strategy from which all modern optimisers evolved is called SGD, the abbreviation of stochastic gradient descent.
The basic rule of it is quite simple, but the intuition of why the formula is the way it is, or why does it work, is challenging to get.
This post is an attempt to give you one intuition on the mechanics of this function.
Comentários