Introduction to Machine Learning Normal Distribution & Scatter Plot

December 16, 2021

In This Article, You Will Learn About Normal Data Distribution & Scatter Plot.

Machine Learning Scatter Plot – Before moving ahead, let’s take a look at Introduction to Machine Learning Data Distribution

Normal Data Distribution

Normal Data Distribution, also known as Gaussian distribution, is a probability distribution about the mean, where data is close to the mean. In terms of the graph, Normal Distribution is a Bell curve.

Example: Create a normal data distribution.

				
					import numpy

x = numpy.random.normal(1.0, 5.0, 100)

print(x)

				
					[ 0.19935991  5.85104715 -2.67213907 -5.97652442  1.75126141  9.8609244
  4.34525602 -8.74664979  3.56081038  5.01468427 -1.38178401 -0.61602218
  1.94855173 10.57462583  2.43083263 -1.36238357 -5.99868417 11.40409819
 -6.99816036 -0.72314799 -1.01607005  0.63735967 -0.52424221  1.06672784
  4.74411588  1.17918339 -2.29418946  6.91767372 -0.57057854  2.95191082
 -5.16932202  0.05963505 -0.70969709  3.26325876  5.39778701 -6.58461994
 -2.72504226  3.40522399  2.48434784  9.6541778  -1.9603614  -7.19355151
 -5.82756284 -2.71365687 -1.98649041 -4.72671319 -4.08510306 -5.09459941
  5.89608513 -1.52963702  4.30785246 -1.84062452 -2.79910216  9.85250189
  0.02733057 -2.37580637  2.30513711  4.89935987  2.41339619  8.24105277
 -6.97220871  7.23408138  5.73830868  1.03452691 -2.4682868  -0.36834531
  2.54009345  2.3091234  -4.61574719  9.75438975 -4.82448548  4.30075834
 11.33703458  2.79215539 -1.28598825  8.65055361  1.62363358  3.74636701
 -0.4942166  -6.60518467 -4.43281626  7.46541364 -3.60203674  3.05693217
 -1.6127856  -1.64327323  0.22141275  4.06846927  1.2275703  -3.09907521
 -6.06620857  8.24082997 -3.47668151  0.68322403 -7.48268384  6.72596703
  5.93251581 -5.96546461 -0.36800936  5.31551061]

Example: Create a normal data distribution graph.

				
					import numpy

import matplotlib.pyplot as plt

x = numpy.random.normal(1.0, 5.0, 100)

plt.hist(x, 50)

plt.show()

We use the array from the numpy.random.normal() method, with 100 values,

We have specified that the mean value is 5.0 and the average deviation of 1.0.

The values should be mostly within 5.0 and never further than 1.0 from the average.

Scatter Plot

A scatterplot shows that each value in data collection is represented with dots.

The Matplotlib module includes a technique for drawing scatter plots. It requires two arrays of the same size and one array that contains the values for the x-axis and the other for the values of the y-axis.

Example: Use the scatter() method to draw a scatter diagram.

				
					import matplotlib.pyplot as plt

x_axis = [10, 12, 18, 25, 33]
y_axis = [18, 29, 34, 39, 42]

plt.scatter(x_axis, y_axis)

plt.show()

Learn More about Scatter Plot at Matplotlib Tutorial.

Random Data Distributions

For Machine Learning, the data sets can include thousands or even millions of data points.

There is a chance that you don’t have real-life data to use when trying to test an algorithm, so you might need to use randomly generated values.

The NumPy module can assist us in accomplishing this!

Let’s make two arrays of 1500 random numbers from a data distribution

In the first one, you will be set with the mean at 3.0 with an average deviation of 1.0.

This second array will be set with the mean at 8.0 with an average deviation of 2.0.

Example: Use Numpy module to plot diagram with 1500 dots.

				
					import numpy as np

import matplotlib.pyplot as plt

x_axis = np.random.normal(3.0, 1.0, 1500)
y_axis = np.random.normal(8.0, 2.0, 1500)

plt.scatter(x_axis, y_axis)

plt.show()