codingstreets
Search
Close this search box.
machine-learning-train-test

Introduction to Machine Learning Train/Test

In This Article, You Will Learn About Machine Learning Data Set.

Machine Learning Scale – Before moving ahead, let’s take a look at Machine Learning Scale.

Table of Contents

Evaluate Model

When we use Machine Learning we create models to predict the outcomes of certain events, for instance, in the lesson before, where we predicted the ID of the student, when we were aware of the weight and Roll_No.

To determine whether the model is accurate enough, we could use an approach called Train/Test.

Train/Test is an approach to test how accurate your models are.

It’s called Train/Test since it splits data set into two parts: one set that is a testing set and a test set.

*You train the model using the training set.

*You test the model using the testing set.

*Train the model means create the model.

*Test the model means test the accuracy of the model.

Assume Data Set

Assume a data set you want to work/test with.

Example: Assume a data set of 50 students in a class.

				
					import numpy as pk
import matplotlib.pyplot as plt
pk.random.seed(2)

x = pk.random.normal(10, 20, 50)
y = pk.random.normal(25, 30, 50)/x

plt.scatter(x,y)
plt.show()
				
			

The x axis represents number of students at the interval of 10.

The y axis represents number of students at the interval of 5.

Split Into Train/Test

The training set should consist of 80 percent of the data.

The test set is the remaining 20 percent.

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

Display the Train Model

				
					import numpy as pk
import matplotlib.pyplot as plt
pk.random.seed(2)

x = pk.random.normal(10, 20, 50)
y = pk.random.normal(25, 30, 50)/x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

plt.scatter(train_x,train_y)
plt.show()


				
			

Display the Test Model

				
					import numpy as pk
import matplotlib.pyplot as plt
pk.random.seed(2)

x = pk.random.normal(10, 20, 50)
y = pk.random.normal(25, 30, 50)/x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

plt.scatter(test_x,test_y)
plt.show()
				
			

Fit the Data Set

What would the data set appear like? Let’s try to fit the data with Polynomial Regression So let’s draw an outline of a polynomial regression.

For drawing a straight line between all the points of data, employ to use the plot() method of the Matplotlib module.

Example: Draw a polynomial regression line through the data points.

				
					import numpy as pk
import matplotlib.pyplot as plt
pk.random.seed(2)

x = pk.random.normal(10, 20, 50)
y = pk.random.normal(25, 30, 50)/x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

model = pk.poly1d(pk.polyfit(train_x, train_y, 2))

line = pk.linspace(1, 4, 50)

plt.scatter(test_x,test_y)
plt.plot(line, model(line))
plt.show()
				
			

The results can support my idea of fitting the data set to the polynomial regression model however it might yield some strange results if we attempt to predict values that are not part from the dataset.

But what is the score of R-squared? The R-squared score is an excellent indicator of how my data set fits the model.

R2

R2 is also known as R-squared.

The results can support my idea of fitting the data set to the polynomial regression model however it might yield some strange results if we attempt to predict values that are not part from the dataset.

But what is the score of R-squared? The R-squared score is an excellent indicator of how my data set fits the model.

Example: Let’s see whether data set fits well or not.

				
					import numpy as pk
from sklearn.metrics import r2_score
pk.random.seed(2)

x = pk.random.normal(10, 20, 50)
y = pk.random.normal(25, 30, 50)/x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

model = pk.poly1d(pk.polyfit(train_x, train_y, 2))

r2 = r2_score(train_y, model(train_x))
print(r2)
				
			
				
					Output - 

0.03958675473193263
				
			

Start with Testing Set

Now we’ve created an acceptable model at the very least when it comes to the training data.

We are now going to verify the model by using the test data to determine if it produces the same results.

Example: Let’s find the R2 score when using testing data.

				
					import numpy as pk
from sklearn.metrics import r2_score
pk.random.seed(2)

x = pk.random.normal(10, 20, 99)
y = pk.random.normal(25, 30, 99)/x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

model = pk.poly1d(pk.polyfit(train_x, train_y, 4))
r2 = r2_score(test_y, model(test_x))

print(r2)

				
			
				
					Output - 

-0.2399553815802038
				
			

Predict Values

Once we’ve confirmed that the formula is valid and we are able to begin predicting new results.

				
					import numpy as pk
import matplotlib.pyplot as plt
pk.random.seed(2)

x = pk.random.normal(10, 20, 50)
y = pk.random.normal(25, 30, 50)/x

train_x = x[:80]
train_y = y[:80]

test_x = x[80:]
test_y = y[80:]

model = pk.poly1d(pk.polyfit(train_x, train_y, 2))

line = pk.linspace(1, 4, 50)

plt.scatter(test_x, test_y)
plt.plot(line, model(line))
plt.show()

print(model(10))

				
			
				
					Output - 

2.1378158872036197
				
			

If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.

Connect on:

Recent Post

Popular Post

Top Articles

Archives
Categories

Share on