# How to Implement L2 Regularization with Python

One of the most common types of regularization techniques shown to work well is the L2 Regularization.

In today’s tutorial, we will grasp this technique’s fundamental knowledge shown to work well to prevent our model from overfitting. Once you complete reading the blog, you will know that the:

• L2 Regularization takes the sum of square residuals + the squares of the weights * 𝜆 (read as lambda).
• Essential concepts and terminology you must know.
• How to implement the regularization term from scratch.
• Finally, other types of regularization techniques.

To get a better idea of what this means, continue reading.

## What is Regularization and Why Do We Need It?

We have discussed in previous blog posts regarding how gradient descent works, linear regression using gradient descent and stochastic gradient descent over the past weeks. We have seen first hand how these algorithms are built to learn the relationships within our data by iteratively updating their weight parameters.

While the weight parameters are updated after each iteration, it needs to be appropriately tuned to enable our trained model to generalize or model the correct relationship and make reliable predictions on unseen data.

Most importantly, besides modeling the correct relationship, we also need to prevent the model from memorizing the training set. And one critical technique that has been shown to avoid our model from overfitting is regularization. The other parameter is the learning rate; however, we mainly focus on regularization for this tutorial.

Note: If you don’t understand the logic behind overfitting, refer to this tutorial

We also have to be careful about how we use the regularization technique. If too much of regularization is applied, we can fall under the trap of underfitting.

## Ridge Regression

$$J(\theta) = \frac{1}{2m} \sum_{i}^{m} (h_{\theta}(x^{(i)}) – y^{(i)}) ^2 + \frac{\lambda}{2m} \sum_{j}^{n} \theta_{j}^{(2)}$$

Here’s the equation of our cost function with the regularization term added. By taking the derivative of the regularized cost function with respect to the weights we get:

$$\frac{\partial J(\theta)}{\partial \theta} = \frac{1}{m} \sum_{j} e_{j}(\theta) + \frac{\lambda}{m} \theta$$

It’s essential to know that the Ridge Regression is defined by the formula which includes two terms displayed by the equation above:

• The first term looks very familiar to what we have seen in this tutorial which is the average cost/loss over all the training set (sum of squared residuals)
• The second term looks new, and this is our regularization penalty term, which includes 𝜆 and the slope squared.

You might notice a squared value within the second term of the equation and what this does is it adds a penalty to our cost/loss function, and 𝜆 determines how effective the penalty will be.

For the lambda value, it’s important to have this concept in mind:

• If 𝜆 is too large, the penalty value will be too much, and the line becomes less sensitive.
• If 𝜆=0, we are only minimizing the first term and excluding the second term.
• If 𝜆 is low, the penalty value will be less, and the line does not overfit the training data.

To choose the appropriate value for lambda, I will suggest you perform a cross-validation technique for different values of lambda and see which one gives you the lowest variance.

## Applying Ridge Regression with Python

Now that we understand the essential concept behind regularization let’s implement this in Python on a randomized data sample.

Open up a brand new file, name it ridge_regression_gd.py, and insert the following code:

Let’s begin by importing our needed Python libraries from NumPy, Seaborn and Matplotlib.

Within the ridge_regression function, we performed some initialization.

For an extra thorough evaluation of this area, please see this tutorial.

This snippet’s major difference is the highlighted section above from lines 39 – 50, including the regularization term to penalize large weights, improving the ability for our model to generalize and reduce overfitting (variance).

For the final step, to walk you through what goes on within the main function, we generated a regression problem on lines 62 – 67.

Within line 69, we created a list of lambda values which are passed as an argument on line 73 – 74. Then the last block of code from lines 76 – 83 helps in envisioning how the line fits the data-points with different values of lambda.

To visualize the plot, you can execute the following command:

To summarize the difference between the two plots above, using different values of lambda, will determine what and how much the penalty will be. As we can see from the second plot, using a large value of lambda, our model tends to under-fit the training set.

## Types of Regularization Techniques

Here are three common types of Regularization techniques you will commonly see applied directly to our loss function:

## Conclusion

In this post, you discovered the underlining concept behind Regularization and how to implement it yourself from scratch to understand how the algorithm works. You now know that:

• L2 Regularization takes the sum of square residuals + the squares of the weights * lambda.
• How to choose the perfect lambda value.
• How to implement the regularization term from scratch in Python.
• And a brief touch on other regularization techniques.

You should click on the “Click to Tweet Button” below to share on twitter.

We have listed some useful resources below if you thirst for more reading.

## Books

1. Nice post. I used to be checking constantly this weblog and I am impressed!
Extremely useful information specially the ultimate section :
) I maintain such information much. I used to be looking
for this particular information for a very lengthy time.

Thank you and good luck.

2. Your blog consist of valuable information. Keep up the good works.

• David Praise Chukwuma Kalu

Thank you very much Venus. I do appreciate your feedback.

3. Wօw, awesome blog structure! You make running a blog look easy. The total look of your website is magnificent,
as smartly as the content!

• David Praise Chukwuma Kalu

Thank you very much Deidre. 🙂