In This Article, You Will Learn About Percentile & Data Distribution.
Machine Learning Data Distribution – Before moving ahead, let’s take a look at Introduction to Machine Learning SD
Table of Contents
Percentile
What is Percentile?
In simple words, it is a way to show where an observation meets requirements within a range of another observation.
In statistics, Percentile is a number that indicates the value at which a specific percentage of values is less than.
Example: A record contains ten student’s marks.
Marks = [50, 58, 80, 45, 62, 86, 78, 82, 96, 51]
What is meaning of 80 percentiles? It returns 82.8 as answer. Meaning that 80% percent of students got 82.8 or higher marks.
The NumPy module has a method called percentile() method to find out the percentile.
Example: Use percentile() method to find the percentile.
import numpy as np
Marks = [50, 58, 80, 45, 62, 86, 78, 82, 96, 51]
x = np.percentile(Marks, 80)
print (x)
Output -
82.8
As a result, it returned a value that shows how many marks students got above 80%.
Example: What is the number that 45% of students got higher marks.
import numpy as np
Marks = [50, 58, 80, 45, 62, 86, 78, 82, 96, 51]
x = np.percentile(Marks, 45)
print (x)
Output -
62.8
As it shown, it returned a value that shows how many marks students got above 45%.
Data Distribution
Earlier, we worked with data, but that was on a small scale, which means that data was just a bit of significant and realistic data in real-world problems.
It is tedious to work with big data in the early stage of small projects.
So how to get big data?
Big Data Sets
To build large data sets, we use the Python module NumPy, which has a range of options to generate random data sets regardless of size.
Example: Create a set of 100 random numbers from 10 to 20.
import numpy as np
data_set = np.random.uniform(10, 20, 100)
print (data_set)
[13.38299205 18.01580669 17.02081562 11.77377081 12.69501008 18.76522416
13.91344119 13.60202607 12.15678422 14.11413323 13.21394625 16.54937797
13.59474679 14.26320092 13.80828941 16.34440104 10.39993553 15.46494338
10.41695753 12.76514056 11.72558636 19.02983381 10.66168355 11.76709056
15.60670045 17.72205624 18.95663704 15.5515067 11.90693495 16.31697104
11.4483667 15.11501072 11.52720814 11.2289283 15.12808577 12.61315005
14.39045934 10.85921454 11.30185564 15.31424129 19.27459975 10.70108656
19.94672168 16.39083343 13.21372947 10.80109336 13.00774192 18.62775174
14.4165342 12.11367457 15.7729824 19.65623126 16.19241096 16.11580766
12.26976805 19.34958711 14.08468552 13.09321088 17.67535161 17.96116085
10.68782391 17.55858529 18.32730636 11.86655782 13.03786586 10.41227579
14.56578666 11.27612942 17.48043824 11.97837879 15.50466613 14.48695143
19.79191699 13.12199032 18.88577316 10.23694902 17.68729269 18.28547097
14.05099738 11.09106986 19.53213855 14.32717191 18.45282231 18.28829096
19.65434796 13.94371623 12.72428148 13.46691759 13.28333032 17.16257679
14.86544913 12.85619305 10.39865977 13.98904176 14.56325333 14.96543929
15.15778962 13.39786431 10.01417788 12.51462504]
Histogram
To visualize the data set in the graph, we can use the Matplotlib method hist().
Note: Find out more about using the Matplotlib module by reading Matplotlib Tutorial.
Example: Draw a histogram with the given data set.
import numpy as np
import matplotlib.pyplot as plt
data_set = np.random.uniform(10, 20, 100)
plt.hist(data_set, 18)
plt.show()
As a result, it returned a Histogram created from given data.
Note: Data is generated randomly, therefore every time it will not return the same result.
Big Data Distributions
A set of 100 values does not consider a lot, but you can make an array of random values. By changing your parameters, you can make the data set as large as you’d like.
Example: Create a set of 1000 random numbers from 1 to 20.
import numpy as np
data_set = np.random.uniform(1, 20, 1000)
print (data_set)
[ 2.68284406 19.30694991 16.65383921 18.55471701 4.70346159 16.29699052
9.50559452 11.10950173 14.3041972 1.39679134 17.72094537 4.14102451
9.54203582 18.77060463 12.51891703 17.31634713 11.70200927 12.05415392
3.81738685 1.5661812 11.24821039 18.54222366 16.18801869 18.99189221
14.56934761 5.3644479 17.91362365 17.80043123 6.36233596 2.5723287
3.2887804 18.82218981 6.19916744 8.80655601 11.0755259 12.81059619
17.84134297 19.93284239 14.88767059 15.05756906 8.96185196 4.75425607
15.35962198 11.69880277 15.90197501 2.34193269 4.83790642 3.16225479
6.1547387 4.12802624 12.35229347 1.11053044 3.76303772 13.03854785
19.6487084 13.94900319 16.67858887 7.46180519 7.67090646 17.66687265
3.45180209 13.41481762 19.29474335 12.12074493 1.84056973 18.12877054
5.94005805 19.44342378 1.21762351 17.90821213 4.08002328 7.18451625
5.06869492 15.40131127 10.5710893 7.78646599 16.87135914 11.38047853
4.23085077 5.8045335 17.64617202 1.60025407 2.31686046 13.02668694
6.12200063 18.54691212 7.03539393 7.13276625 12.89408759 13.56044378
5.94902979 12.28058722 15.86691714 4.54260625 1.90325003 13.97100725
17.09954854 3.14509746 10.95124018 10.31550978 10.93273627 17.0859282
14.29736977 12.6945781 14.1120251 14.2543168 4.97813506 7.50472078
9.35401342 6.4704281 6.32950381 9.8485998 13.27325802 9.56066328
1.16270337 9.73355442 8.83447507 13.75864309 16.54317726 1.4462913
18.0424996 13.58435098 10.45178192 5.57316267 6.60111476 14.60948159
9.01016988 4.73155305 10.72594475 1.92208041 16.75167933 19.46995145
18.1144034 14.47006203 13.92000884 3.3824039 8.74174971 18.1798602
17.35194918 15.85792074 8.29637198 6.37516705 10.51614556 19.92357184
12.37006969 1.95233356 9.55441568 1.40142866 10.1312481 18.02089924
18.29029191 17.73409822 17.84251397 9.7447081 9.71481697 13.6548868
17.37030435 16.36110037 2.20253671 1.25945193 16.98632498 16.66871667
16.47734076 14.75418909 11.82011605 14.65770869 10.33160747 6.15865809
18.86801083 15.27174114 15.78001809 7.97126725 8.75705367 14.11080899
18.0901586 4.33671917 6.34645877 16.0543543 13.74162995 5.62893212
6.87519141 4.18667454 18.12897226 16.76950034 6.63657079 7.62916742
12.31055102 12.63054808 9.05282658 4.458712 16.03326673 8.9712681
14.37981465 13.01470915 11.77843574 5.0598084 8.56230647 7.62765179
18.08111907 16.00672054 11.85420408 7.55759042 17.56853919 5.87619219
14.82067182 16.53249367 16.14679673 8.81673644 2.36191931 7.91540339
14.60880143 18.25186613 1.35301916 1.39289951 7.19234846 8.59003438
10.65926329 11.56859105 13.3086017 10.53845096 9.05920321 2.38675077
13.98210117 12.27678627 2.11788438 10.56490776 8.1807473 10.23851933
12.1700239 12.70801787 1.70334211 10.04025611 7.03456016 10.96545274
13.36661346 12.01991377 4.47020019 5.25230198 18.55228793 2.9751601
19.30324689 4.70227332 3.78800656 5.65181784 6.27681342 2.29156551
4.85132641 2.2168593 19.18334034 1.84605864 5.68471505 5.82173105
18.9890535 9.99189014 13.19219269 6.00917318 10.66303158 16.59943689
18.49386956 6.86122869 4.65412214 2.83544282 14.39486846 2.79235635
17.98245437 14.63851702 2.86578784 3.30032668 3.81220202 9.99366174
8.033083 11.99681773 17.51653442 3.76386469 11.55797275 7.74000305
1.38022942 3.70198612 15.39116237 13.39837799 12.38785551 3.32810113
9.06933208 14.56616268 8.47560263 13.65045843 7.58276071 16.247074
19.64338362 11.79075262 5.58799464 15.83210538 14.43117395 11.5508438
5.82289929 2.30115398 5.61743903 13.72937397 14.41120782 17.15616908
15.74530842 18.03067512 9.8895974 5.10173368 14.9546928 9.42473319
18.14967249 6.17852932 18.62979844 13.81425162 10.40391681 5.04455942
19.65621511 8.64739888 15.06320431 17.58040548 15.57964799 5.39830662
17.04715289 6.7352639 12.07345064 2.56205838 8.79697532 7.15802427
12.44821443 16.45619984 16.72507094 4.70716806 1.48013735 5.20611648
12.15979943 3.18866054 16.47910834 16.28355109 3.62130467 3.07307956
2.60717192 17.08254354 8.10494809 10.58204015 18.39375232 14.53077326
7.49103107 12.29224884 6.65818839 8.44392058 5.19610276 8.16156943
11.90445797 17.21148527 14.39354418 1.47673518 6.85730247 12.94819688
7.30028159 16.09814415 8.01658236 5.70582793 14.05202239 3.07217961
5.51333652 9.79646816 12.73109336 13.42984753 1.19931686 18.98223597
9.21191132 10.77960153 3.44222073 5.91753139 8.9059537 17.81532335
11.26923825 1.70155757 8.94562283 4.11207134 10.24657421 10.90081654
3.25893502 2.69287545 3.04676386 9.46630252 18.27922935 2.15723553
14.88767479 19.53463582 11.43093946 8.28183849 5.40649948 17.94455735
1.78805073 1.04516752 7.06630742 14.34729831 4.52037893 12.81790365
16.03971273 4.14512564 18.92757643 6.73775376 12.69152368 9.60705336
5.96629972 11.92864593 6.73882302 18.42825044 19.44192096 13.98098841
19.18517718 12.65311457 1.26130143 10.36038209 13.79683437 10.52840618
7.92908896 16.22838485 12.38787933 12.44017955 12.07564921 19.16857589
1.72209157 8.45158243 19.29683279 1.90449683 8.16089726 8.41843022
12.60912041 16.56266832 3.87542206 14.65918682 8.95657421 6.8362758
11.43972523 11.53629186 1.1818171 2.89668459 11.55067712 19.71487259
13.53286144 18.25786524 6.83215967 13.19429236 12.41575259 3.90490732
7.79793911 1.35720961 19.38969549 16.68092726 6.39675592 11.58863663
1.52434281 17.76421562 7.01080559 5.98679707 19.6360701 19.02191209
11.86296726 6.66684482 18.01003076 12.9927335 11.78975425 14.44809825
9.76253463 11.23858045 3.47369978 11.57247884 13.41913773 4.01150967
13.88701186 8.13189203 2.26619129 17.04138505 2.96337124 13.72430942
10.11391851 18.13870581 4.07872642 18.96126303 6.05165301 1.62107589
6.27944621 11.63782584 15.87898477 13.50048893 10.42468209 14.32255352
3.56642721 5.39458348 3.60331924 10.6333141 19.91236931 11.12253267
16.95428674 14.37098124 17.05158391 9.24116609 3.72413343 11.15113696
2.31104941 15.44686824 1.35054295 9.54523556 15.5672758 11.55648213
19.61085311 18.96168866 10.35441923 13.53334166 4.69494122 5.45828648
8.99397961 9.88431995 17.40923859 2.53905286 14.10332414 4.77018695
6.92331657 8.87406083 14.9990223 19.2697055 18.31991966 10.335444
14.40082601 19.05141842 6.45326835 10.56246992 6.93588843 16.1575082
8.72344603 19.21693783 12.40084231 2.82880249 17.26081728 15.50571215
18.22913392 9.08525062 8.81517044 14.00295998 11.27110522 15.06006983
8.78592585 6.03519428 1.21862452 2.12559042 14.9782146 19.04203281
1.24332743 3.97096582 16.90648162 8.28166465 14.18988428 2.08693087
1.08651955 3.21052702 11.8330645 5.13361811 1.84292134 19.55180562
6.76788255 7.03221102 15.98148186 13.94630962 3.5548162 3.6278384
16.01803076 12.46003371 1.94028536 14.45309543 16.88674581 13.60143857
2.47334437 16.44452332 10.36331142 13.50185818 1.80504367 5.35045681
7.53205433 15.14150467 4.60184005 18.82810544 19.66101154 13.16138002
19.01445706 1.98284414 5.31535847 19.65592877 8.65797539 12.7629716
16.25685409 13.61716591 19.57035376 9.91142541 14.03565244 15.75965915
4.9445043 15.45547834 11.203243 9.8733124 1.9328347 12.58114448
6.879032 14.59125248 4.15563702 8.28933544 11.29932155 3.49515306
10.54992373 12.29487422 18.54678782 10.66959335 16.46020035 7.70054899
13.10847404 14.31281761 6.40356048 19.64755775 9.4222633 5.52483508
17.36450206 8.88394896 3.00485118 8.78415 9.67515373 7.51144372
2.02538234 4.72197569 9.68805904 17.97288966 13.80733266 14.28454767
7.68220899 3.40843569 9.24727294 6.57583007 2.8914721 19.59426975
11.60274023 11.39963997 3.26091623 18.52749383 14.73757444 1.21068702
7.08294279 18.00521277 2.78958391 9.79245563 4.78128938 14.70301425
11.75840885 16.25484715 6.90578491 15.02079385 14.28398209 17.46925102
18.56251637 14.52477152 4.93576374 2.72879774 18.97938336 6.85353327
1.49163371 2.6337953 17.18447156 15.01752815 12.40977831 1.52975153
17.34197869 3.43182961 1.57839746 11.73795199 19.05783612 19.71879662
11.37938967 16.08249135 13.19375133 19.21793169 17.99955706 17.20476262
17.43288283 3.51212476 3.64369151 10.51111751 10.77195893 10.37812739
4.12083327 6.58363575 1.82828506 7.55504627 9.12313459 17.32197239
9.45846971 1.32421291 9.13941967 13.93079524 10.82720674 4.59170739
9.67271158 17.37155406 17.56945832 4.10585577 11.86716721 8.9094759
5.25787425 11.41212783 12.29677281 12.5607613 1.56572299 16.13805135
6.63457627 14.58395642 11.59932162 8.65977715 11.17144656 9.92630729
15.66162148 7.82260476 10.22238646 5.19037597 18.82565311 15.9281575
16.53349542 1.22812144 18.44855757 13.64598686 9.54173728 11.01796179
6.32587344 5.94682566 13.65491177 15.28287223 16.92188191 13.27513146
8.34912492 12.93857362 11.56082266 9.53413789 14.70346224 9.83312831
6.68529342 15.56211145 16.88310286 14.9993651 4.56165993 8.00319503
4.83032029 18.52915637 8.81295345 6.48064353 3.75929371 4.91112693
17.09476815 5.2739131 4.69094164 7.82627319 14.52544194 16.25652485
4.99139303 14.48741318 15.28226039 13.0599178 3.01140484 16.37546923
8.16349377 7.77769467 7.20164244 4.87508143 2.04154432 15.30465129
4.47306262 17.21360493 8.59699346 9.77599858 1.45770273 2.7048445
11.92154927 8.89460821 9.78640363 12.81791229 11.28708905 8.28616898
16.96559147 8.54799823 18.86598749 19.78388517 15.10944487 12.09970784
11.27405175 10.57452053 7.7802434 19.19013378 3.29663787 1.72740293
7.38176408 13.0658844 13.01662051 14.06911427 9.01198737 12.56844029
14.71080473 18.51037596 11.59073208 3.10104498 3.69900617 12.82106544
14.63939889 15.75955821 8.4277436 10.05631337 11.24681532 13.56329425
7.65946352 2.17431344 3.05459262 18.24270356 17.67016885 1.70765799
5.82143697 3.821201 2.02213436 4.80726968 13.68237456 18.49932392
9.49306261 15.14684949 2.68074297 8.02049596 16.41188829 18.72110056
12.84778665 4.77533094 4.04047908 17.8435927 8.4660641 15.61991871
2.19808815 6.71597302 11.3959533 3.8019543 10.81602106 13.22017599
13.93689926 9.61999353 7.18364705 6.76681695 13.61586318 2.56682801
1.2635081 13.14878474 19.74097372 18.96720024 11.6213272 1.35385837
9.22516327 18.62260496 11.30850318 16.55786945 17.0795904 4.35594454
5.83906367 6.94655833 3.23484597 17.94922942 12.75849988 7.65597128
2.44667273 14.20888511 16.44046922 12.13289775 16.47074426 10.46082682
17.53321003 9.40250986 3.80957857 1.14144124 5.95104479 2.55214873
15.03623717 11.98151374 4.83774944 16.54707546 18.89287499 3.675759
6.8657436 6.99032026 7.99076721 4.87949426 19.46837191 3.79614568
6.72792438 11.26955338 6.90773764 6.33031443 11.34585115 14.81423048
19.12230506 16.78362203 9.04538296 8.34005977 16.70997302 17.9694551
19.14426204 2.99099817 18.26237007 7.91653762 13.8818763 9.96841786
11.62077927 6.27019828 11.75004855 3.26273425 13.86333844 4.40368066
6.43366134 17.98923619 14.73532524 14.07716539 10.19648463 14.55291085
3.97172266 3.20639077 16.8158471 16.6589792 11.63183583 15.11471195
2.85747805 16.43737151 11.25354336 14.25672552 13.4879228 13.83305666
11.83302899 14.53237581 19.46133221 11.14732109 9.9364367 19.10141919
19.84264944 10.82693011 13.59181113 14.12451847 18.80255227 12.91549099
17.21204908 13.4456572 14.21361558 13.42743164 16.0411952 3.8614885
6.79280165 3.20691356 9.28432956 11.11471783 6.88681969 4.34494846
15.33603516 4.01908672 10.1040883 11.29970739 4.51261244 18.09215055
3.61361207 15.59025818 17.302529 3.28711345 17.11885296 16.78262672
14.58830822 18.84395424 12.59180699 14.15110893 17.45168623 14.7809512
7.44724417 13.48323567 4.90653204 12.86433649 3.03764207 7.6844456
8.62670856 1.93745013 6.35474297 4.69304274 1.41573836 2.35272146
16.82016687 18.09218852 19.6064135 14.5920794 17.58456152 15.83111451
5.37509657 3.34028129 4.1456231 13.9866889 6.18081457 4.49961602
1.69223174 4.19981165 12.49173066 16.32761904]
Example: Draw a histogram with the given data set.
import numpy as np
import matplotlib.pyplot as plt
data_set = np.random.uniform(1, 20, 1000)
plt.hist(data_set, 250)
plt.show()