Introduction to Linear Regression

Linear regression is one of the most widely known and well-understood algorithms in the Machine Learning landscape. Since it’s one of the most common questions in interviews for a data scientist.

In this tutorial, you will understand the basics of the linear regression algorithm. How it works, how to use it and finally how you can evaluate its performance.

Previously I gave you highlights on different learning problems under Supervised Learning, which could be Regression or Classification. Let’s see an example of a regression problem.

House Price Linear Regression

A predictor line which predicts the estimates the housing price machine learning — Figure 1: A predictor line which predicts the estimates of housing price.

Let’s say we are using the housing prices dataset from the City of Belgrade, Serbia. In this dataset, we have the number of houses of different sizes sold for different prices.

Estimate the price of the house size given the size in squared feet machine learning — Figure 2: Estimate the price of the house size given the size in squared feet.

Given this dataset, your landlord is looking to put one of his houses up for sale and his home is right about 1750 square feet. With the house’s size provided, you want to suggest how for how much he will sell his house. As we fit a straight line through these data points, we may suggest that he could sell it for around $2 million based on the graph.

Linear Regression One Variable

A simple illustration of how a Supervised Learning algorithm works is by feeding the data collected, or the “Training set”, to our learning algorithm. It’s now the algorithms’ job to output a function that estimates a given house’s price.

ML processing. Collecting data, passing the trained data into an algorithm and making predictions — Figure 3: The process of a Supervised Learning Algorithm.

You might be wondering how our estimated function is defined. To remind you, our prediction function for one variable is the equation of a straight line defined as $y = \theta_{0} + \theta_{1} * x$ commonly seen as $y=mx+b$.

Note: $x$ is the house’s size, and $y$ is the price placed on these houses. $\theta_{1}$ is the slope or gradient of a line, and $\theta_{0}$ is the y-intercept or simply where the line crosses the $y$ axis.

Here are some essential notations I will be using consistently.

m = number of training examples
n = number of features
X = feature matrix
x = feature vector
y = target vector
$\theta$ = parameters

Now let’s conduct a little experiment. Set up your environment, open up a new file, title it linear_regression.py, save it, and insert the following code. Let’s roll.

→ Click here to download the code

# import the necessary packages

import numpy as np

from matplotlib import pyplot as plt

from sklearn.linear_model import LinearRegression

Let’s begin by importing libraries as well as modules from Matplotlib, NumPy, and also scikit-learn. If you aren’t yet aware of these collections

Numpy: is made use of for mathematical processing with Python.
Matplotlib: For data visualization in Python. And,
Scikit-learn: Which has the machine learning algorithms we’ll cover today.

The snippet above takes care of importing our needed Python packages. We’ll be using the scikit-learn package, so if you do not already have it set up, make sure you adhere to these guidelines to get it set up on your computer.

We will additionally be utilizing the NumPy library and you might follow the setup process right here and also for Matplotlib.

For training, we will be using the feature “size of the house” as an input feature to predict the price of each house.

Sample of structure data for linear regression with one variable. The size of the house with the price at which it was sold — Figure 4: Training dataset with one feature.

# Defining the size of the house and converting

# our array into an vector. (rank 1 array)

x = np.array([1000, 1750, 2200])

x = x.reshape(-1, 1)

# Defining our target variable which is

# the price of these houses

y = np.array([950000, 2250000, 2400000])

Here $x$ is the house’s size, while $y$ corresponds to the houses’ price.

# instantiate the Linear Regression class and

# training the model with available data-points

classifier = LinearRegression()

classifier.fit(x, y)

# assigning the intercept coefficient to theta0

# and regression coefficient the slope of a line to theta1

theta0 = classifier.intercept_

theta1 = classifier.coef_

Next, let’s start by instantiating the Linear Regression class to a variable classifier. From there, we access the fit function to train the algorithm with the given argument $x$ and $y$.

To visualize the line that best fits our data points, we used $\theta_{0} $ and $\theta_{1}$ along with $x$.

Here’s a quick question for you:

To test the linear relationship of y (dependent) and x (independent) continuous variables, which plot is best suited? Share on X

Yep, you guessed it right. It’s a scatter plot.

Here we plot our training data and the best-fitted line used to determine the house’s price.

Best fit line on a sample dataset — Figure 5: The predictor line that best fits our training dataset.

# The equation of a straight line

y_pred_skl = theta0 + (x * theta1)

# Visualization the line that best fits our dataset

plt.figure(figsize=(14, 8))

plt.scatter(x, y, s=200, marker='*', color='blue', cmap='RdBu')

plt.plot(x, y_pred_skl, c='r')

plt.xlabel('area(sqr ft)')

plt.ylabel('price($)')

plt.title('Linear Regression Sk-learn')

plt.grid()

To begin making housing price predictions, we need to supply the model a new sample it has not seen before and finally execute our script.

Prediction on new sample datapoint linear regression — Figure 6: Estimating the new price of a house given the size in feet squared.

# Make some new predictions given a sample it has never seen

# before using the .predict() function and passing in the

# size of the house

x_test = [1800]

prediction = classifier.predict([x_test])

print("House price for size {}(srft) \

is ${:.2f}".format(x_test[0], prediction[0]))

plt.scatter(x_test[0], prediction[0], s=300, marker='*',

color='green', label='test data')

plt.legend()

plt.show()

Once we execute the saved linear_regression.py script, we will get the resulting price.

[david@neuraspike] python linear_regression.py

[david@neuraspike]

[david@neuraspike] House price for size 1800(srft) is $2055952.38

Now that we’ve learned what working with only one feature vector is like let’s move on towards working with multiple features.

Linear Regression Multiple Variables

Let’s look into Linear Regression with Multiple Variables. It’s known as Multiple Linear Regression.

In the previous example, we had the house size as a feature to predict the price of the house with the assumption of $\hat{y}= \theta_{0} + \theta_{1} * x$.

Sample data with multiple features — Figure 7: Training dataset with multiple features.

Now we were given a lot more information to anticipate the price, like the number of bedrooms and the age of these houses (years). The form of our new assumption will be $\hat{y}(x) = \theta_{0} + \theta_{1} * x_{1} + \theta_{2} * x_{2} + \theta_{3} * x_{3}$.

Instead of writing our new assumption as above, we need to define a mathematical function to generalize when working with more than three attributes, as we will indeed represent as $n$.

To generalize, we will define our feature $x_{0} = 1$ (constant) to avoid the line passing through the origin. Then $\hat{y} = \theta \cdot x^T$ where our predictions is simply the vector product between our parameter vector $\theta$ consisting of $\theta = [\theta_{0},…,\theta_{n}]$ and x consisting of $ x_{0} $ the constant concatenated with $n$ features (houses size, no of bedrooms, home’s age), $ X = [1, x_{1}, …, x_{n}] $.

Let’s conduct another little experiment. Open up another new file, title it multiple_linear_regression.py, save it, and insert the following code.

We will use the same function “model” as above. So please copy and paste it into your new script.

# the size of the house, no of bedrooms,

# and age of the home (years)

X = np.array([

[1000, 3.0, 12],

[1750, 4.0, 10],

[2200, 5.0, 5]

])

# The price of these houses

y = np.array([950000, 2250000, 2400000])

# instantiate the Linear Regression class and

# training the model with available data-points

classifier = LinearRegression()

classifier.fit(X, y)

# Make some new predictions given a sample it has never seen

# before using the .predict() function and passing in the

# size of the house

x_test = np.array([1800, 4.0, 7])

prediction = classifier.predict([x_test])

print("House price with {} srft, {} bedrooms and {} years"

" old is ${:.2f}".format(x_test[0], x_test[1],

x_test[2], prediction[0]))

The only distinction between the snippet we have seen currently and the one from before is from Line 3 – 7. Below we included many more attributes such as the variety of bedrooms and the home’s age to predict the cost of a house.

$ python multiple_linear_regression.py

$ House price with 1800.0 srft, 4.0 bedrooms and

$ 7.0 years old is $1867761.72

Once we execute the python script, we get the output above.

Next, we need to specify a metric we can use to evaluate our linear regression model.

How can we evaluate the error?

To determine if our prediction is good, we need to define the error or residual to be the difference between the target $y$ and our predicted value $\hat{y}$.

Mean squared error explanation — Figure 8: Visualization of how we can evaluate if our prediction is good or not.

If our model predicted well, the difference between our prediction $\hat{y}$ and $y$ would be very small on average.

$ J(\theta) = \frac{1}{2m} \sum_{j} (y^{(j)} – \hat{y}^{(j)})^2$

We can rewrite the equation as below.

$ J(\theta) = \frac{1}{2m} \sum_{j} (y^{(j)} – \theta \cdot x^{(j)T})^2$

Using the equation of the mean squared error above as our cost function, we can get the sum of the differences across multiple samples between our observed values and the predictions. Where $m$ is the total number of samples (the red dots above).

You might ask three questions.

The first being, why are we squaring the difference?
Secondly, why are we dividing by 2?
Thirdly, how does $\theta$ affect the features we are multiply by?

Let’s answer the first question. The reason why we square the difference $(y^{(j)} – \hat{y}^{(j)})^2$ is that bigger mistakes result in even more errors than smaller sized mistakes, meaning our model is punished for making larger errors.

And the second question is to simplify for mathematical convenience. When you differentiate $J(\theta)$, you will get an extra 2. To eliminate that, 2 is kept beforehand in the denominator. Not to overload you, in the next tutorial, I will go into more details with the derivation.

Note: They are many other metrics available. However, the mean squared error is a common choice since it’s computationally convenient.

Thirdly, the best way I like to think of $\theta$ is as a set of weights or parameters that control the system’s behavior. Meaning it determines how each of the features affects the price of the house. So,

Positive weight = If we receive a positive weight, increasing the value of that feature increases the value of our prediction $\hat{y}$
Zero weight = If our feature’s weight is zero, then it does not affect our prediction $\hat{y}$
Negative weight = If we receive a negative weight, increasing the value of that feature decreases the value of our prediction $\hat{y}$

Finding good parameters

Given a representation of the cost function $J(\theta)$ the problem of learning is converted into a basic optimization problem which we will learn more in detail in the next tutorial. A simple quesion I want you to think about is:

How can we find the correct values of Θ that minimizes our cost function J(Θ)? Share on X

Conclusion

To conclude this tutorial, you discovered how to implement linear regression step-by-step with a sample dataset. You learned:

How to call the Linear Regression model from the sklearn module.
How to make predictions using your learned model.
A metric we can use when evaluating your regression model
Three commonly asked questions about linear regression.

In the next tutorial, we will learn how to find both the intercept coefficient and regressions coefficient (also known as weights when referring to deep learning) for a linear regression model from your training data.

Do you have any questions about this post or linear regression? Leave a comment and ask questions, I’ll do my best to answer.

To get access to the source codes used in all of the tutorials, leave your email address in any of the page’s subscription forms.

Articles

Course

Andrew Ng, Machine Learning Specialization, Standford Coursera

Books

To be notified when this next blog post goes live, be sure to enter your email address in the form!

8 Comments

Arpan 5 years ago Reply
Great post, David!
I think you should mention the shortcomings of linear regression as well. For instance, referring to the Anscombe’s Quartet will be really helpful for learners to understand that linear regression can fail if you have non-linear relationships in your data.
- David Praise Chukwuma Kalu Post Author 5 years ago Reply
  That’s absolutely correct Arpan. Thanks for referring to the Anscombe’s Quartet technique.
Denice Peixoto 5 years ago Reply
hello, your site is very good. Following your blog posts.
- David Praise Chukwuma Kalu Post Author 5 years ago Reply
  Thank you very much 😊
Hanzlik 5 years ago Reply
Your reasoning should be accepted as the benchmark when it comes to this topic.
- David Praise Chukwuma Kalu Post Author 5 years ago Reply
  Thank you Hanzlik.
Lukas 5 years ago Reply
May I just say what a relief to discover someone
who actually knows what they’re discussing on the internet.
You certainly understand how to bring a problem to light
and make it important. More people have to check this out and understand this side of your story.
It’s surprising you’re not more popular since you most
certainly possess the gift.
- David Praise Chukwuma Kalu Post Author 5 years ago Reply
  Thank you, Lukas. Soon I’m hoping more people will reach my content. Feel Free to share among your colleges.

Introduction to Linear Regression

House Price Linear Regression

Linear Regression One Variable

Linear Regression Multiple Variables

How can we evaluate the error?

Finding good parameters

Conclusion

Further Reading

Articles

Course

Books

What You Don’t Know About Machine Learning Could Hurt You

Why is Gradient Descent Important to know

Related Posts

How to Flip an Image using Python and OpenCV

Adding a web interface to our NFT Search Engine in 3 steps with Flask

Building an NFT Search Engine in 3 steps using Python and OpenCV

8 Comments

Write A Comment Cancel Reply