Why is Gradient Descent Important to know

In the previous tutorial, I asked how we can find the correct values of $\theta$ that minimize our cost function $J(\theta)$.

In this tutorial, you will understand the general idea of how gradient descent works, the mathematics behind it and you will implement a simple python illustration. 

What is Gradient Descent?

If you are not familiar with the term gradient descent, it is an optimization algorithm to find the minimum of a function. What I mean by that, is we are searching for a value that gives the lowest output to that function. 

While going through textbooks or courses, this function is often called the loss/cost function or even an objective function.  

Before going directly into the explanation I promised last week, let’s start with a simple experiment by doing a hands-on the calculation behind gradient descent.

Using desmos for graphical visualization, let’s say you have a function $\frac{df}{dx} = x^{2}-4x+2$ plotted above. Given the function, we want to find the minimum of  $x$ and by reading the value from the plot, it’s at $(2, -2)$. 

How to Compute Gradient Descent by Hand

One thing I want you to remember is in calculus, if we had to take the derivative of the function $\frac{df}{dx} = x^{2}-4x+2$ with respect to $x$ and set it equal to 0, we would get the following:

$\frac{df}{dx} = x^{2}-4x+2= 0$

        $2x – 4 = 0$

        $2x = 4$

        $x = 2$

A question now I have for you is to find the global minima of the function $f(x) = x^{2}-4x+2$ starting from the point $x=5$.

Now the way we tackle this problem with Gradient Descent is different. At first, we start with a random value $x = 5$. Given our initial numerical value, we should plug it into our derivative.

$\frac{df}{dx} = x^{2}-4x+2$

       $= 2x – 4 $

       $= 2(5) – 4$

       $= 6$

Once we insert our starting value into the derivative, our result will be $6$. Remember that the derivative when finding the minimum of the function should be at $2$.

After taking the derivative, we have a positive result $6$. This value we got as a result indicates the direction we should take. As far as we are concerned, we know the value is large since the minimum is at $2$. The value indicates we should go backward.

Likewise, if the situation was opposite and we had a number less than 2, we want to go forward.

Note: It’s important to realize that we know if we are near or far from the minimum by taking the derivative of our initial guess.

$x_{i+1} = x_{i} – \alpha \frac{df}{dx}$

By making use of the equation above where:

  • $x_{i}$ - is our initial guess
  • $\alpha$ - is the learning rate (makes the "decay slow")
  • $\frac{df}{dx}$ - the derivative of our function. (going in the opposite direction of the derivative)
  • $x_{i+1}$ - is the next guess

Let’s review a quick example by hand by performing 2 iterations of gradient descent. I will strongly recommend you grab a piece of paper and a pen.

$x_{i+1} = x_{i} – \alpha * \frac{df}{dx}$

$x_{1} = x_{0} – 0.15 * \frac{df}{dx}$

$x_{1} = 5 – 0.15 * 6$

$x_{1} = 4.1$

After plugging in the numbers and performing the calculation, the result becomes $4.1$.  Pay close attention, you will notice how $x_{1}$ tends to drive closer to the minimum spot with each iteration.

Now we take our solution and repeat the same process above, the only difference will be the initial starting point. Instead of the value being $5$ we will replace it with $4.1$.

$\frac{df}{dx} = x^{2}-4x+2$

                      $ = 2x – 4 $

                      $ = 2(4.1) – 4$

                      $ = 4.2$

$x_{2} = x_{1} – 0.15 * \frac{df}{dx}$

$x_{2} = 4.1 – 0.15 * 4.2$

$x_{2} = 3.47$

By looking at the result, you will notice after the second iteration, we got a new value of $3.47$ which is much closer to our minimum. And by repeating this process, we will eventually converge to our minimum value.

Note: it is essential to take notice of the height (y-axis). If it is increasing, it is a clear signal we have reached the bottom. 

Applying Gradient Descent in Python

To successfully perform the process correctly let’s write a simple Python program and follow the process listed below.

  1. Obtain our objective function.
  2. Initialize randomly a value $x$ from which to start the descent.
  3. Define the learning rate that determines how quickly we will converge to the minimum.
  4. Get the derivative of that value $x$.
  5. Progress to descend by the learning rate multiplied by the derivative.
  6. Update the old value of $x$ with the new value.
  7. Inspect your stopping condition.
  8. If the conditions is fulfilled, stop. If not, repeat step 4.

Now let’s conduct a little experiment. Set up your environment, open up a new file, title it gradient_descent_illustration.py, save it and insert the following code. Let’s roll. 

current_guess = 5  # we randomly start at x=5
alpha = 0.15  # the learning rate
total_iteration = 30  # total number of time we will run the algorithm
current_iteration = 0  # keep track of the current iteration
precision = 0.0001  # determines the stop condition of the step-wise descent
height = float('inf')  # set the height as maximum

First, let’s start by defining some crucial constants below:

  • current_guess – our randomly picked starting point.
  • alpha – the learning rate.
  • total_iterations – the maximum amount of times we will repeat the process.
  • precision – to help determine our stop condition if there is a difference between the previous step and the current step.
  • height – set as maximum by default and will be updated after each iteration. 
# the derivative of our function (x^2 - 4x + 2)
def derivative(x):
    """
    :param x: the initial starting point (numerical value)
    :return: the derivative of x based on the input value (x)
    """
    return 2 * x - 4

Next, we now have to define a function derivative that receives the value as an input parameter and returns a numerical value. That is the derivative of $x$.

# check if the difference between our previous guess
# and current guess is small and also if we haven't
# reached the total number of iterations defined.

while height > precision and current_iteration < total_iteration:
    previous_guess = current_guess  # keep track of our previous guess

    # perform gradient descent
    current_guess = previous_guess - alpha * derivative(current_guess)

    # increment the counter once the process is complete
    current_iteration = current_iteration + 1

    # keep track of the difference between our previous and current guess
    height = abs(current_guess - previous_guess)

    print(f"Epoch: {current_iteration}/{total_iteration}\t"
          f" x: {current_guess:.4f}\theight {height:.4f}")

To begin updating our $x$ value, let us execute the process we performed by hand repeatedly based on a given boolean condition we have set. That is if the difference between the previous and current guess isn’t more significant than our default precision value. As well as the current_value is less than total_iteration. Then we will keep updating our new guess and increment the current_value each time we go through the loop. 

Note: This process will continue to execute until both of the conditions are satisfied. 

python gradient_descent_illustration.py
 
Epoch: 1/30      x: 4.1000      height 0.9000
Epoch: 2/30      x: 3.4700      height 0.6300
Epoch: 3/30      x: 3.0290      height 0.4410
Epoch: 4/30      x: 2.7203      height 0.3087
Epoch: 5/30      x: 2.5042      height 0.2161
Epoch: 6/30      x: 2.3529      height 0.1513
Epoch: 7/30      x: 2.2471      height 0.1059
Epoch: 8/30      x: 2.1729      height 0.0741
Epoch: 9/30      x: 2.1211      height 0.0519
Epoch: 10/30     x: 2.0847      height 0.0363
Epoch: 11/30     x: 2.0593      height 0.0254
Epoch: 12/30     x: 2.0415      height 0.0178
Epoch: 13/30     x: 2.0291      height 0.0125
Epoch: 14/30     x: 2.0203      height 0.0087
Epoch: 15/30     x: 2.0142      height 0.0061
Epoch: 16/30     x: 2.0100      height 0.0043
Epoch: 17/30     x: 2.0070      height 0.0030
Epoch: 18/30     x: 2.0049      height 0.0021
Epoch: 19/30     x: 2.0034      height 0.0015
Epoch: 20/30     x: 2.0024      height 0.0010
Epoch: 21/30     x: 2.0017      height 0.0007
Epoch: 22/30     x: 2.0012      height 0.0005
Epoch: 23/30     x: 2.0008      height 0.0004
Epoch: 24/30     x: 2.0006      height 0.0002
Epoch: 25/30     x: 2.0004      height 0.0002
Epoch: 26/30     x: 2.0003      height 0.0001
Epoch: 27/30     x: 2.0002      height 0.0001

Once we execute the saved gradient_descent_illustration.py script, we will get the printout on our console.

Conclusion

To conclude this tutorial, you discovered the basic concept of how gradient descent works, which will be very useful all through your machine learning journey. This is why you must understand the inner workings of this algorithm. You learned:

  • The simplest form of the gradient descent algorithm.
  • The simple implementation in Python.
  • An intuitive understanding of this algorithm and you are now ready to apply it to real-world problems.

To get all access to the source code used in all tutorials, leave your email address in any of the page’s subscription forms.

In the next tutorial, we will continue on how we can find the correct values of $\theta$ that minimize our cost function $J(\theta)$ or the “mean squared error” for a linear regression model from your training data.

If you enjoyed the tutorial behind gradient descent, feels free to click on the “click to tweet button” below.

Simple Illustration behind the gradient descent algorithm

Do you have any questions about this post or Gradient Descent? Leave a comment and ask questions, I’ll do my best to answer.

Further Reading

We have listed some useful resources below if you thirst for more reading.

Course

Articles

Books

To be notified when this next blog post goes live, be sure to enter your email address in the form below!

Upon entering your email for new visitors, you will receive memorable infographics just for you by your email and all access to the source code used in all tutorials. Be sure to confirm to respect the privacy of everyone. 

Share

Share on facebook
Share on twitter
Share on linkedin
Share on reddit

Speak Your Mind

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. This is super informative! I’m glad I read your post as it’s better than similar blogs I’ve seen from most other bloggers on this subject. Can I ask you to write more about this? Could you provide any further example? Thanks!

About Me

David Praise Chukwuma Kalu

David Praise Chukwuma Kalu

He's an entrepreneur who loves Computer Vision and Machine Learning.

Read more about him

Get the cheatsheet I wish I had before starting my career as a

About Me

David Praise Chukwuma Kalu

David Praise Chukwuma Kalu

He's a Data Scientist and as well an entrepreneur who loves Computer Vision and Machine Learning.

Get weekly data science tips from David Praise that keeps you more informed. It's data science school in bite-sized chunks!