Table of Contents
Statistical Significance Tests
In Statistics, Statistical Significance means that it gives the result of something with reason. It cannot have produced a result without reason or accidentally.
In other words, it observed a relationship in data occurred while producing a result in interest of some fact rather than by chance or randomly.
To perform Statistical Significance Tests, SciPy has a module called scipy.stats.
Let’s get started with some techniques that will have importance while performing Statistical Significance Tests.
Hypothesis in Statistics
A Hypothesis test in Statistics, is a process of observing data on the basis of some collected data.
Null Hypothesis in Statistics is a process of having no relationship between two variables or set of data and that controlling one variable does not affect the other.
Example: A phone’s battery is put toward the side of sunlight.
Example: A boy is studying with light in the day on the roof.
Alternative Hypothesis in Statistics is a process of having a relationship between two variables or set of data and that controlling one variable affects the other.
In other words, it is the opposite of the Null Hypothesis.
Example: Planting water on the tree.
One tailed test
When hypothesis is testing for only one side of the value, it is called “one tailed test”.
Example: Value of variable x is either greater than k or less than k.
Two tailed test
When hypothesis is testing for both sides of the value, it is called “two tailed test”.
Example: Value of variable x is equal to K and either greater than k or less than k. In this case, value will be checked for both sides.
Alpha value in Statistics is a Significance level. In other words, the Alpha value is a threshold value to check whether a test is statistically significant.
Example: How far your result is close to being rejected by the null hypothesis.
It is usually taken as 0.02, 0.08, or 0.5.
The Null hypothesis is a probability of knowing how close to the extreme the data is.
In other words, it is a probability of measuring detailed statistical data such as mean or standard deviation of assumed probability distribution will be greater than or equal to observed results.
The T-Test is used to determine if there is a significant difference between two variables’ data or set of data, which may be observed as the same in some criteria.
To know whether the value belongs to the same variable or not, use function ttest_ind().
Example: Find if the given values of variable value_1 and value_2 are from same distribution.
import numpy as np from scipy.stats import ttest_ind value_1 = np.random.normal(size=50) value_2 = np.random.normal(size=50) result = ttest_ind(value_1, value_2) print(result)
To return only the P-value, using property pvalue.
Example: Returning only P-value using property pvalue.
import numpy as np from scipy.stats import ttest_ind value_1 = np.random.normal(size=50) value_2 = np.random.normal(size=50) result = ttest_ind(value_1, value_2).pvalue print(result)
KS-Test is used to check whether if given values follow a set of distribution or not.
It takes following –
- This function takes value to be tested
- CDF as two parameters – either a string or a callable function
It can be used as a one tailed or two tailed test.
It is default two-tailed. Parameter alternative can be passed as a string with one of two-sided, less or greater.
Example: Find if the given value follows the normal distribution.
import numpy as np from scipy.stats import kstest value = np.random.normal(size=50) result = kstest(value, 'norm') print(result)
Statistical Description of Data
To see a summary of values in an array, use the function describe().
It returns the following data.
- number of observations (nobs)
- minimum and maximum values = minmax
Example: Give statistical descriptions of the array’s values.
import numpy as np from scipy.stats import describe value = np.random.normal(size=50) result = describe(value) print(result)
Normality Tests (Skewness and Kurtosis)
Normality tests are based upon skewness or kurtosis.
The normaltest() function returns p value for the null hypothesis.
“x comes from a normal distribution”.
It measures the symmetry in data.
It is 0 for normal distribution.
If data is left skewed, it means it is negative.
If data is right skewed, it means it is positive.
Kurtosis is a measurement of telling whether a data is heavy or lightly tailed to a normal distribution.
Positive – Heavy tailed
Negative – Lightly tailed
Example: Find skewness and kurtosis of values in an array.
import numpy as np from scipy.stats import skew, kurtosis value = np.random.normal(size=50) print(skew(value)) print(kurtosis(value))
Example: Find if the data comes from a normal distribution.
import numpy as np from scipy.stats import normaltest value = np.random.normal(size=50) print(normaltest(value))
If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.