In This Article, You Will Learn About Normal Data Distribution & Scatter Plot.
Machine Learning Scatter Plot – Before moving ahead, let’s take a look at Introduction to Machine Learning Data Distribution
Table of Contents
Normal Data Distribution
Normal Data Distribution, also known as Gaussian distribution, is a probability distribution about the mean, where data is close to the mean. In terms of the graph, Normal Distribution is a Bell curve.
Example: Create a normal data distribution.
import numpy
x = numpy.random.normal(1.0, 5.0, 100)
print(x)
[ 0.19935991 5.85104715 -2.67213907 -5.97652442 1.75126141 9.8609244
4.34525602 -8.74664979 3.56081038 5.01468427 -1.38178401 -0.61602218
1.94855173 10.57462583 2.43083263 -1.36238357 -5.99868417 11.40409819
-6.99816036 -0.72314799 -1.01607005 0.63735967 -0.52424221 1.06672784
4.74411588 1.17918339 -2.29418946 6.91767372 -0.57057854 2.95191082
-5.16932202 0.05963505 -0.70969709 3.26325876 5.39778701 -6.58461994
-2.72504226 3.40522399 2.48434784 9.6541778 -1.9603614 -7.19355151
-5.82756284 -2.71365687 -1.98649041 -4.72671319 -4.08510306 -5.09459941
5.89608513 -1.52963702 4.30785246 -1.84062452 -2.79910216 9.85250189
0.02733057 -2.37580637 2.30513711 4.89935987 2.41339619 8.24105277
-6.97220871 7.23408138 5.73830868 1.03452691 -2.4682868 -0.36834531
2.54009345 2.3091234 -4.61574719 9.75438975 -4.82448548 4.30075834
11.33703458 2.79215539 -1.28598825 8.65055361 1.62363358 3.74636701
-0.4942166 -6.60518467 -4.43281626 7.46541364 -3.60203674 3.05693217
-1.6127856 -1.64327323 0.22141275 4.06846927 1.2275703 -3.09907521
-6.06620857 8.24082997 -3.47668151 0.68322403 -7.48268384 6.72596703
5.93251581 -5.96546461 -0.36800936 5.31551061]
Example: Create a normal data distribution graph.
import numpy
import matplotlib.pyplot as plt
x = numpy.random.normal(1.0, 5.0, 100)
plt.hist(x, 50)
plt.show()
We use the array from the numpy.random.normal() method, with 100 values,
We have specified that the mean value is 5.0 and the average deviation of 1.0.
The values should be mostly within 5.0 and never further than 1.0 from the average.
Scatter Plot
A scatterplot shows that each value in data collection is represented with dots.
The Matplotlib module includes a technique for drawing scatter plots. It requires two arrays of the same size and one array that contains the values for the x-axis and the other for the values of the y-axis.
Example: Use the scatter() method to draw a scatter diagram.
import matplotlib.pyplot as plt
x_axis = [10, 12, 18, 25, 33]
y_axis = [18, 29, 34, 39, 42]
plt.scatter(x_axis, y_axis)
plt.show()
Learn More about Scatter Plot at Matplotlib Tutorial.
Random Data Distributions
For Machine Learning, the data sets can include thousands or even millions of data points.
There is a chance that you don’t have real-life data to use when trying to test an algorithm, so you might need to use randomly generated values.
The NumPy module can assist us in accomplishing this!
Let’s make two arrays of 1500 random numbers from a data distribution
In the first one, you will be set with the mean at 3.0 with an average deviation of 1.0.
This second array will be set with the mean at 8.0 with an average deviation of 2.0.
Example: Use Numpy module to plot diagram with 1500 dots.
import numpy as np
import matplotlib.pyplot as plt
x_axis = np.random.normal(3.0, 1.0, 1500)
y_axis = np.random.normal(8.0, 2.0, 1500)
plt.scatter(x_axis, y_axis)
plt.show()