Introduction to Python SciPy Statistical Significance Tests

In This Article, You Will Learn About Python SciPy Statistical Significance Tests.

Python SciPy Statistical – Before moving ahead, let’s know a bit about Python SciPy Interpolation

Table of Contents

Statistical Significance Tests

In Statistics, Statistical Significance means that it gives the result of something with reason. It cannot have produced a result without reason or accidentally.

In other words, it observed a relationship in data occurred while producing a result in interest of some fact rather than by chance or randomly.

To perform Statistical Significance Tests, SciPy has a module called scipy.stats.

Let’s get started with some techniques that will have importance while performing Statistical Significance Tests.

Hypothesis in Statistics

A Hypothesis test in Statistics, is a process of observing data on the basis of some collected data.

Null Hypothesis

Null Hypothesis in Statistics is a process of having no relationship between two variables or set of data and that controlling one variable does not affect the other.

Example: A phone’s battery is put toward the side of sunlight.

Example: A boy is studying with light in the day on the roof. 

Alternate Hypothesis

Alternative Hypothesis in Statistics is a process of having a relationship between two variables or set of data and that controlling one variable affects the other.

In other words, it is the opposite of the Null Hypothesis.

Example: Planting water on the tree.

One tailed test

When hypothesis is testing for only one side of the value, it is called “one tailed test”.

Example: Value of variable x is either greater than k or less than k.

Two tailed test

When hypothesis is testing for both sides of the value, it is called “two tailed test”.

Example: Value of variable x is equal to K and either greater than k or less than k. In this case, value will be checked for both sides.

Alpha Value

Alpha value in Statistics is a Significance level. In other words, the Alpha value is a threshold value to check whether a test is statistically significant.

Example: How far your result is close to being rejected by the null hypothesis.         

It is usually taken as 0.02, 0.08, or 0.5.

P value

The Null hypothesis is a probability of knowing how close to the extreme the data is.

In other words, it is a probability of measuring detailed statistical data such as mean or standard deviation of assumed probability distribution will be greater than or equal to observed results.

T-Test

The T-Test is used to determine if there is a significant difference between two variables’ data or set of data, which may be observed as the same in some criteria.

To know whether the value belongs to the same variable or not, use function ttest_ind().

Example: Find if the given values of variable value_1 and value_2 are from same distribution.

				
					import numpy as np
from scipy.stats import ttest_ind

value_1 = np.random.normal(size=50)
value_2 = np.random.normal(size=50)

result = ttest_ind(value_1, value_2)

print(result)

				
			

To return only the P-value, using property pvalue.

Example: Returning only P-value using property pvalue.

				
					import numpy as np
from scipy.stats import ttest_ind

value_1 = np.random.normal(size=50)
value_2 = np.random.normal(size=50)

result = ttest_ind(value_1, value_2).pvalue

print(result)
				
			

KS-Test

KS-Test is used to check whether if given values follow a set of distribution or not.

It takes following –

  1. This function takes value to be tested
  2. CDF as two parameters – either a string or a callable function

It can be used as a one tailed or two tailed test.

It is default two-tailed. Parameter alternative can be passed as a string with one of two-sided, less or greater.

Example: Find if the given value follows the normal distribution.

				
					import numpy as np
from scipy.stats import kstest

value = np.random.normal(size=50)

result = kstest(value, 'norm')

print(result)
				
			

Statistical Description of Data

To see a summary of values in an array, use the function describe().

 It returns the following data.

  1. number of observations (nobs)
  2. minimum and maximum values = minmax
  3. mean
  4. variance
  5. skewness
  6. kurtosis

Example: Give statistical descriptions of the array’s values.

				
					import numpy as np
from scipy.stats import describe

value = np.random.normal(size=50)
result = describe(value)

print(result)
				
			

Normality Tests (Skewness and Kurtosis)

Normality tests are based upon skewness or kurtosis.

The normaltest() function returns p value for the null hypothesis.

“x comes from a normal distribution”.

Skewness:

It measures the symmetry in data.

It is 0 for normal distribution.

If data is left skewed, it means it is negative.

If data is right skewed, it means it is positive.

Kurtosis:

Kurtosis is a measurement of telling whether a data is heavy or lightly tailed to a normal distribution.

Positive – Heavy tailed

Negative – Lightly tailed

Example: Find skewness and kurtosis of values in an array.

				
					import numpy as np
from scipy.stats import skew, kurtosis

value = np.random.normal(size=50)

print(skew(value))
print(kurtosis(value))

				
			

Example: Find if the data comes from a normal distribution.

				
					import numpy as np
from scipy.stats import normaltest

value = np.random.normal(size=50)

print(normaltest(value))

				
			

If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.

Connect on:

Leave a Comment

Stay in the loop

codingstreets