In This Article, You Will Learn About Linear Regression.
Machine Learning Linear Regression – Before moving ahead, let’s take a look at Introduction to Machine Learning Scatter Plot
Table of Contents
Regression
Regression is a term employed to determine the relation between two variables.
When it comes to Machine Learning and statistical modeling, this relationship can predict future events’ results.
Linear Regression
Linear regression utilizes the relationship between data points to draw a straight line across them.
This line is a way to forecast future value.
How Does It Work?
Python offers methods to determine an association between data points and draw lines of regression.
Example: Use the scatter() method to draw a scatter diagram.
import matplotlib.pyplot as plt
x_axis = [10, 12, 18, 25, 33]
y_axis = [18, 29, 34, 39, 42]
plt.scatter(x_axis, y_axis)
plt.show()
Example: Use scipy() module to draw the line of Linear Regression.
import matplotlib.pyplot as plt
from scipy import stats
x_axis = [10, 12, 18, 25, 33]
y_axis = [18, 29, 34, 39, 42]
slope, intercept, r, p, std_err = stats.linregress(x_axis, y_axis)
def first_func(x_axis):
return slope * x_axis + intercept
info = list(map(first_func, x_axis))
plt.scatter(x_axis, y_axis)
plt.plot(x_axis, info)
plt.show()
In lines 1 and 2,
The Matplotlib and SciPy module are imported to draw diagrams and lines.
In lines 4 and 5,
The variables x and y are defined as the data points that show points on the graph.
From lines 7 to 10,
Used slope and intercept to return the new values with the help of the def function. This new value defines where on the y-axis the similar value of x will represent.
In line 12,
Run each value of the x through the function. This will result in a new collection with new values for the y-axis.
In line 13,
Draw the linear regression.
Relationship
It is crucial to understand the relation between the value of the x-axis and that of the y-axis. If there isn’t a relationship, the linear regression cannot be used to determine anything.
This relation – the coefficient of correlation is also known as “r.”
The r-value can range from -1 to1, where 0 indicates no connection, while 1 and -1 indicate a 100% relationship.
Python and the Scipy module can compute this value; all you need to do is input the x and y numbers.
Example: Check whether data fits in linear regression or not.
from scipy import stats
x_axis = [10, 12, 18, 25, 33]
y_axis = [18, 29, 34, 39, 42]
slope, intercept, r, p, std_err = stats.linregress(x_axis, y_axis)
print(r)
Output -
0.9070296500990837
Predict Future Values
Now, we can apply the data we’ve collected to anticipate future value.
Example: Predict the sale of product ‘x’.
from scipy import stats
x_axis = [10, 12, 18, 25, 33]
y_axis = [18, 29, 34, 39, 42]
slope, intercept, r, p, std_err = stats.linregress(x_axis, y_axis)
def first_func(x):
return slope * x + intercept
sale = first_func(5)
print(sale)
Output -
19.23089700996677
Bad Result
Let’s create an example where Linear Regression cannot predict future value.
Example: Create a data that represents bad result for Linear Regression.
import matplotlib.pyplot as plt
from scipy import stats
x_axis = [10, 8, 1, 5, 39]
y_axis = [1, 29, 43, 32, 5]
slope, intercept, r, p, std_err = stats.linregress(x_axis, y_axis)
def first_func(x_axis):
return slope * x_axis + intercept
info = list(map(first_func, x_axis))
plt.scatter(x_axis, y_axis)
plt.plot(x_axis, info)
plt.show()
Example: Check whether ‘r’ returns low value or not.
from scipy import stats
x_axis = [10, 8, 1, 5, 39]
y_axis = [1, 29, 43, 32, 5]
slope, intercept, r, p, std_err = stats.linregress(x_axis, y_axis)
print(r)
Output -
-0.6779846149737578