Different Types Of Gradient Descent Algorithm

Let's talk about different types of Gradient Descent algorithm. Actually there is a huge gradient descent family of algorithms: mini-batch gradient descent, stochastic gradient descent, gradient descent with momentum, Rmsprop, Adagrad, Adam, BFGS, L-Bfgs and many more.

The main problem with batch gradient descent is the fact that it uses the whole training set to compute the gradients at every step, which makes it very slow when the training set is very large. At the opposite extreme, Stochastic Gradient Descent just picked a random instance in the training set at every step and computed the gradients based only on that single instance. Obviously this makes the algorithm much faster since it has very little data to manipulate at every iteration. It also makes it possible to train on huge treating sets, since only one instance needs to be in memory at each iteration (SGD can be implemented as an out of core algorithm).

On the the other hand, due to it's stochastic (random) nature, this algorithm is much less regular than Batch Gradient Descent: instead of gently decreasing until it reaches the minimum, the cost function will bounce up and down, decreasing only on average. But it is useful, when the cost function is very irregular and has multiple local minima, but we need to converge to the only one global optima. This stochastic nature actually helps gradient descent to jump out off local minima and has better chance in finding the global, but the cons is that the algorithm can never settle at the minimum.

The solution is to gradually reduce the learning rate. The steps start out large, than get smaller and smaller, allowing algorithm to settle at the global minimum. The function that determines the learning rate at each iteration is called the learning schedule. If the learning rate is reduced too quickly it may lead to getting stuck in the local minimum or even end up frozen half way to the minimum, on the other hand, if the learning rate is reduced too slowly, it may jump around the minimum for a long time. So pick up your learning schedule very carefully

Dr Alicia Coutinho

School Health Programs like Student Health checkups , school nurse , infirmary services , awareness school wellness sessions , school psychologist counsellor

5y
Like
Reply

To view or add a comment, sign in

More articles by Rao Babar Ali

  • Naive Bayes Classifier

    Naive Bayes Classifier

    Recently I have been implementing Naive Bayes Classifier for the text classification problem. It's one of the most…

  • How to select a machine learning model for your particular task

    How to select a machine learning model for your particular task

    What I want to talk about today is - how to select a machine learning model for your particular task and why simple…

    1 Comment

Insights from the community

Others also viewed

Explore topics