codingstreets
Search
Close this search box.
machine-learning-data-distribution

Introduction to Machine Learning Percentiles & Data Distribution

In This Article, You Will Learn About Percentile & Data Distribution.

Machine Learning Data Distribution – Before moving ahead, let’s take a look at Introduction to Machine Learning SD

Table of Contents

Percentile

What is Percentile?

In simple words, it is a way to show where an observation meets requirements within a range of another observation.

In statistics, Percentile is a number that indicates the value at which a specific percentage of values is less than. 

Example: A record contains ten student’s marks.

Marks = [50, 58, 80, 45, 62, 86, 78, 82, 96, 51]

What is meaning of 80 percentiles? It returns 82.8 as answer. Meaning that 80% percent of students got 82.8 or higher marks.

The NumPy module has a method called percentile() method to find out the percentile.

Example: Use percentile() method to find the percentile. 

				
					import numpy as np

Marks = [50, 58, 80, 45, 62, 86, 78, 82, 96, 51]

x = np.percentile(Marks, 80)

print (x)

				
			
				
					Output - 

82.8
				
			

As a result, it returned a value that shows how many marks students got above 80%.

Example: What is the number that 45% of students got higher marks.

				
					import numpy as np

Marks = [50, 58, 80, 45, 62, 86, 78, 82, 96, 51]

x = np.percentile(Marks, 45)

print (x)

				
			
				
					Output - 

62.8
				
			

As it shown, it returned a value that shows how many marks students got above 45%.

Data Distribution

Earlier, we worked with data, but that was on a small scale, which means that data was just a bit of significant and realistic data in real-world problems.

It is tedious to work with big data in the early stage of small projects.

So how to get big data?

Big Data Sets

To build large data sets, we use the Python module NumPy, which has a range of options to generate random data sets regardless of size.

Example: Create a set of 100 random numbers from 10 to 20.

				
					import numpy as np

data_set = np.random.uniform(10, 20, 100)

print (data_set)

				
			
				
					[13.38299205 18.01580669 17.02081562 11.77377081 12.69501008 18.76522416
 13.91344119 13.60202607 12.15678422 14.11413323 13.21394625 16.54937797
 13.59474679 14.26320092 13.80828941 16.34440104 10.39993553 15.46494338
 10.41695753 12.76514056 11.72558636 19.02983381 10.66168355 11.76709056
 15.60670045 17.72205624 18.95663704 15.5515067  11.90693495 16.31697104
 11.4483667  15.11501072 11.52720814 11.2289283  15.12808577 12.61315005
 14.39045934 10.85921454 11.30185564 15.31424129 19.27459975 10.70108656
 19.94672168 16.39083343 13.21372947 10.80109336 13.00774192 18.62775174
 14.4165342  12.11367457 15.7729824  19.65623126 16.19241096 16.11580766
 12.26976805 19.34958711 14.08468552 13.09321088 17.67535161 17.96116085
 10.68782391 17.55858529 18.32730636 11.86655782 13.03786586 10.41227579
 14.56578666 11.27612942 17.48043824 11.97837879 15.50466613 14.48695143
 19.79191699 13.12199032 18.88577316 10.23694902 17.68729269 18.28547097
 14.05099738 11.09106986 19.53213855 14.32717191 18.45282231 18.28829096
 19.65434796 13.94371623 12.72428148 13.46691759 13.28333032 17.16257679
 14.86544913 12.85619305 10.39865977 13.98904176 14.56325333 14.96543929
 15.15778962 13.39786431 10.01417788 12.51462504]
				
			

Histogram

To visualize the data set in the graph, we can use the Matplotlib method hist().

Note: Find out more about using the Matplotlib module by reading Matplotlib Tutorial.

Example: Draw a histogram with the given data set.

				
					import numpy as np

import matplotlib.pyplot as plt

data_set = np.random.uniform(10, 20, 100)

plt.hist(data_set, 18)

plt.show()

				
			
machine-learning-data-distribution

As a result, it returned a Histogram created from given data.

Note: Data is generated randomly, therefore every time it will not return the same result.

Big Data Distributions

A set of 100 values does not consider a lot, but you can make an array of random values. By changing your parameters, you can make the data set as large as you’d like.

Example: Create a set of 1000 random numbers from 1 to 20.

				
					import numpy as np

data_set = np.random.uniform(1, 20, 1000)

print (data_set)

				
			
				
					[ 2.68284406 19.30694991 16.65383921 18.55471701  4.70346159 16.29699052
  9.50559452 11.10950173 14.3041972   1.39679134 17.72094537  4.14102451
  9.54203582 18.77060463 12.51891703 17.31634713 11.70200927 12.05415392
  3.81738685  1.5661812  11.24821039 18.54222366 16.18801869 18.99189221
 14.56934761  5.3644479  17.91362365 17.80043123  6.36233596  2.5723287
  3.2887804  18.82218981  6.19916744  8.80655601 11.0755259  12.81059619
 17.84134297 19.93284239 14.88767059 15.05756906  8.96185196  4.75425607
 15.35962198 11.69880277 15.90197501  2.34193269  4.83790642  3.16225479
  6.1547387   4.12802624 12.35229347  1.11053044  3.76303772 13.03854785
 19.6487084  13.94900319 16.67858887  7.46180519  7.67090646 17.66687265
  3.45180209 13.41481762 19.29474335 12.12074493  1.84056973 18.12877054
  5.94005805 19.44342378  1.21762351 17.90821213  4.08002328  7.18451625
  5.06869492 15.40131127 10.5710893   7.78646599 16.87135914 11.38047853
  4.23085077  5.8045335  17.64617202  1.60025407  2.31686046 13.02668694
  6.12200063 18.54691212  7.03539393  7.13276625 12.89408759 13.56044378
  5.94902979 12.28058722 15.86691714  4.54260625  1.90325003 13.97100725
 17.09954854  3.14509746 10.95124018 10.31550978 10.93273627 17.0859282
 14.29736977 12.6945781  14.1120251  14.2543168   4.97813506  7.50472078
  9.35401342  6.4704281   6.32950381  9.8485998  13.27325802  9.56066328
  1.16270337  9.73355442  8.83447507 13.75864309 16.54317726  1.4462913
 18.0424996  13.58435098 10.45178192  5.57316267  6.60111476 14.60948159
  9.01016988  4.73155305 10.72594475  1.92208041 16.75167933 19.46995145
 18.1144034  14.47006203 13.92000884  3.3824039   8.74174971 18.1798602
 17.35194918 15.85792074  8.29637198  6.37516705 10.51614556 19.92357184
 12.37006969  1.95233356  9.55441568  1.40142866 10.1312481  18.02089924
 18.29029191 17.73409822 17.84251397  9.7447081   9.71481697 13.6548868
 17.37030435 16.36110037  2.20253671  1.25945193 16.98632498 16.66871667
 16.47734076 14.75418909 11.82011605 14.65770869 10.33160747  6.15865809
 18.86801083 15.27174114 15.78001809  7.97126725  8.75705367 14.11080899
 18.0901586   4.33671917  6.34645877 16.0543543  13.74162995  5.62893212
  6.87519141  4.18667454 18.12897226 16.76950034  6.63657079  7.62916742
 12.31055102 12.63054808  9.05282658  4.458712   16.03326673  8.9712681
 14.37981465 13.01470915 11.77843574  5.0598084   8.56230647  7.62765179
 18.08111907 16.00672054 11.85420408  7.55759042 17.56853919  5.87619219
 14.82067182 16.53249367 16.14679673  8.81673644  2.36191931  7.91540339
 14.60880143 18.25186613  1.35301916  1.39289951  7.19234846  8.59003438
 10.65926329 11.56859105 13.3086017  10.53845096  9.05920321  2.38675077
 13.98210117 12.27678627  2.11788438 10.56490776  8.1807473  10.23851933
 12.1700239  12.70801787  1.70334211 10.04025611  7.03456016 10.96545274
 13.36661346 12.01991377  4.47020019  5.25230198 18.55228793  2.9751601
 19.30324689  4.70227332  3.78800656  5.65181784  6.27681342  2.29156551
  4.85132641  2.2168593  19.18334034  1.84605864  5.68471505  5.82173105
 18.9890535   9.99189014 13.19219269  6.00917318 10.66303158 16.59943689
 18.49386956  6.86122869  4.65412214  2.83544282 14.39486846  2.79235635
 17.98245437 14.63851702  2.86578784  3.30032668  3.81220202  9.99366174
  8.033083   11.99681773 17.51653442  3.76386469 11.55797275  7.74000305
  1.38022942  3.70198612 15.39116237 13.39837799 12.38785551  3.32810113
  9.06933208 14.56616268  8.47560263 13.65045843  7.58276071 16.247074
 19.64338362 11.79075262  5.58799464 15.83210538 14.43117395 11.5508438
  5.82289929  2.30115398  5.61743903 13.72937397 14.41120782 17.15616908
 15.74530842 18.03067512  9.8895974   5.10173368 14.9546928   9.42473319
 18.14967249  6.17852932 18.62979844 13.81425162 10.40391681  5.04455942
 19.65621511  8.64739888 15.06320431 17.58040548 15.57964799  5.39830662
 17.04715289  6.7352639  12.07345064  2.56205838  8.79697532  7.15802427
 12.44821443 16.45619984 16.72507094  4.70716806  1.48013735  5.20611648
 12.15979943  3.18866054 16.47910834 16.28355109  3.62130467  3.07307956
  2.60717192 17.08254354  8.10494809 10.58204015 18.39375232 14.53077326
  7.49103107 12.29224884  6.65818839  8.44392058  5.19610276  8.16156943
 11.90445797 17.21148527 14.39354418  1.47673518  6.85730247 12.94819688
  7.30028159 16.09814415  8.01658236  5.70582793 14.05202239  3.07217961
  5.51333652  9.79646816 12.73109336 13.42984753  1.19931686 18.98223597
  9.21191132 10.77960153  3.44222073  5.91753139  8.9059537  17.81532335
 11.26923825  1.70155757  8.94562283  4.11207134 10.24657421 10.90081654
  3.25893502  2.69287545  3.04676386  9.46630252 18.27922935  2.15723553
 14.88767479 19.53463582 11.43093946  8.28183849  5.40649948 17.94455735
  1.78805073  1.04516752  7.06630742 14.34729831  4.52037893 12.81790365
 16.03971273  4.14512564 18.92757643  6.73775376 12.69152368  9.60705336
  5.96629972 11.92864593  6.73882302 18.42825044 19.44192096 13.98098841
 19.18517718 12.65311457  1.26130143 10.36038209 13.79683437 10.52840618
  7.92908896 16.22838485 12.38787933 12.44017955 12.07564921 19.16857589
  1.72209157  8.45158243 19.29683279  1.90449683  8.16089726  8.41843022
 12.60912041 16.56266832  3.87542206 14.65918682  8.95657421  6.8362758
 11.43972523 11.53629186  1.1818171   2.89668459 11.55067712 19.71487259
 13.53286144 18.25786524  6.83215967 13.19429236 12.41575259  3.90490732
  7.79793911  1.35720961 19.38969549 16.68092726  6.39675592 11.58863663
  1.52434281 17.76421562  7.01080559  5.98679707 19.6360701  19.02191209
 11.86296726  6.66684482 18.01003076 12.9927335  11.78975425 14.44809825
  9.76253463 11.23858045  3.47369978 11.57247884 13.41913773  4.01150967
 13.88701186  8.13189203  2.26619129 17.04138505  2.96337124 13.72430942
 10.11391851 18.13870581  4.07872642 18.96126303  6.05165301  1.62107589
  6.27944621 11.63782584 15.87898477 13.50048893 10.42468209 14.32255352
  3.56642721  5.39458348  3.60331924 10.6333141  19.91236931 11.12253267
 16.95428674 14.37098124 17.05158391  9.24116609  3.72413343 11.15113696
  2.31104941 15.44686824  1.35054295  9.54523556 15.5672758  11.55648213
 19.61085311 18.96168866 10.35441923 13.53334166  4.69494122  5.45828648
  8.99397961  9.88431995 17.40923859  2.53905286 14.10332414  4.77018695
  6.92331657  8.87406083 14.9990223  19.2697055  18.31991966 10.335444
 14.40082601 19.05141842  6.45326835 10.56246992  6.93588843 16.1575082
  8.72344603 19.21693783 12.40084231  2.82880249 17.26081728 15.50571215
 18.22913392  9.08525062  8.81517044 14.00295998 11.27110522 15.06006983
  8.78592585  6.03519428  1.21862452  2.12559042 14.9782146  19.04203281
  1.24332743  3.97096582 16.90648162  8.28166465 14.18988428  2.08693087
  1.08651955  3.21052702 11.8330645   5.13361811  1.84292134 19.55180562
  6.76788255  7.03221102 15.98148186 13.94630962  3.5548162   3.6278384
 16.01803076 12.46003371  1.94028536 14.45309543 16.88674581 13.60143857
  2.47334437 16.44452332 10.36331142 13.50185818  1.80504367  5.35045681
  7.53205433 15.14150467  4.60184005 18.82810544 19.66101154 13.16138002
 19.01445706  1.98284414  5.31535847 19.65592877  8.65797539 12.7629716
 16.25685409 13.61716591 19.57035376  9.91142541 14.03565244 15.75965915
  4.9445043  15.45547834 11.203243    9.8733124   1.9328347  12.58114448
  6.879032   14.59125248  4.15563702  8.28933544 11.29932155  3.49515306
 10.54992373 12.29487422 18.54678782 10.66959335 16.46020035  7.70054899
 13.10847404 14.31281761  6.40356048 19.64755775  9.4222633   5.52483508
 17.36450206  8.88394896  3.00485118  8.78415     9.67515373  7.51144372
  2.02538234  4.72197569  9.68805904 17.97288966 13.80733266 14.28454767
  7.68220899  3.40843569  9.24727294  6.57583007  2.8914721  19.59426975
 11.60274023 11.39963997  3.26091623 18.52749383 14.73757444  1.21068702
  7.08294279 18.00521277  2.78958391  9.79245563  4.78128938 14.70301425
 11.75840885 16.25484715  6.90578491 15.02079385 14.28398209 17.46925102
 18.56251637 14.52477152  4.93576374  2.72879774 18.97938336  6.85353327
  1.49163371  2.6337953  17.18447156 15.01752815 12.40977831  1.52975153
 17.34197869  3.43182961  1.57839746 11.73795199 19.05783612 19.71879662
 11.37938967 16.08249135 13.19375133 19.21793169 17.99955706 17.20476262
 17.43288283  3.51212476  3.64369151 10.51111751 10.77195893 10.37812739
  4.12083327  6.58363575  1.82828506  7.55504627  9.12313459 17.32197239
  9.45846971  1.32421291  9.13941967 13.93079524 10.82720674  4.59170739
  9.67271158 17.37155406 17.56945832  4.10585577 11.86716721  8.9094759
  5.25787425 11.41212783 12.29677281 12.5607613   1.56572299 16.13805135
  6.63457627 14.58395642 11.59932162  8.65977715 11.17144656  9.92630729
 15.66162148  7.82260476 10.22238646  5.19037597 18.82565311 15.9281575
 16.53349542  1.22812144 18.44855757 13.64598686  9.54173728 11.01796179
  6.32587344  5.94682566 13.65491177 15.28287223 16.92188191 13.27513146
  8.34912492 12.93857362 11.56082266  9.53413789 14.70346224  9.83312831
  6.68529342 15.56211145 16.88310286 14.9993651   4.56165993  8.00319503
  4.83032029 18.52915637  8.81295345  6.48064353  3.75929371  4.91112693
 17.09476815  5.2739131   4.69094164  7.82627319 14.52544194 16.25652485
  4.99139303 14.48741318 15.28226039 13.0599178   3.01140484 16.37546923
  8.16349377  7.77769467  7.20164244  4.87508143  2.04154432 15.30465129
  4.47306262 17.21360493  8.59699346  9.77599858  1.45770273  2.7048445
 11.92154927  8.89460821  9.78640363 12.81791229 11.28708905  8.28616898
 16.96559147  8.54799823 18.86598749 19.78388517 15.10944487 12.09970784
 11.27405175 10.57452053  7.7802434  19.19013378  3.29663787  1.72740293
  7.38176408 13.0658844  13.01662051 14.06911427  9.01198737 12.56844029
 14.71080473 18.51037596 11.59073208  3.10104498  3.69900617 12.82106544
 14.63939889 15.75955821  8.4277436  10.05631337 11.24681532 13.56329425
  7.65946352  2.17431344  3.05459262 18.24270356 17.67016885  1.70765799
  5.82143697  3.821201    2.02213436  4.80726968 13.68237456 18.49932392
  9.49306261 15.14684949  2.68074297  8.02049596 16.41188829 18.72110056
 12.84778665  4.77533094  4.04047908 17.8435927   8.4660641  15.61991871
  2.19808815  6.71597302 11.3959533   3.8019543  10.81602106 13.22017599
 13.93689926  9.61999353  7.18364705  6.76681695 13.61586318  2.56682801
  1.2635081  13.14878474 19.74097372 18.96720024 11.6213272   1.35385837
  9.22516327 18.62260496 11.30850318 16.55786945 17.0795904   4.35594454
  5.83906367  6.94655833  3.23484597 17.94922942 12.75849988  7.65597128
  2.44667273 14.20888511 16.44046922 12.13289775 16.47074426 10.46082682
 17.53321003  9.40250986  3.80957857  1.14144124  5.95104479  2.55214873
 15.03623717 11.98151374  4.83774944 16.54707546 18.89287499  3.675759
  6.8657436   6.99032026  7.99076721  4.87949426 19.46837191  3.79614568
  6.72792438 11.26955338  6.90773764  6.33031443 11.34585115 14.81423048
 19.12230506 16.78362203  9.04538296  8.34005977 16.70997302 17.9694551
 19.14426204  2.99099817 18.26237007  7.91653762 13.8818763   9.96841786
 11.62077927  6.27019828 11.75004855  3.26273425 13.86333844  4.40368066
  6.43366134 17.98923619 14.73532524 14.07716539 10.19648463 14.55291085
  3.97172266  3.20639077 16.8158471  16.6589792  11.63183583 15.11471195
  2.85747805 16.43737151 11.25354336 14.25672552 13.4879228  13.83305666
 11.83302899 14.53237581 19.46133221 11.14732109  9.9364367  19.10141919
 19.84264944 10.82693011 13.59181113 14.12451847 18.80255227 12.91549099
 17.21204908 13.4456572  14.21361558 13.42743164 16.0411952   3.8614885
  6.79280165  3.20691356  9.28432956 11.11471783  6.88681969  4.34494846
 15.33603516  4.01908672 10.1040883  11.29970739  4.51261244 18.09215055
  3.61361207 15.59025818 17.302529    3.28711345 17.11885296 16.78262672
 14.58830822 18.84395424 12.59180699 14.15110893 17.45168623 14.7809512
  7.44724417 13.48323567  4.90653204 12.86433649  3.03764207  7.6844456
  8.62670856  1.93745013  6.35474297  4.69304274  1.41573836  2.35272146
 16.82016687 18.09218852 19.6064135  14.5920794  17.58456152 15.83111451
  5.37509657  3.34028129  4.1456231  13.9866889   6.18081457  4.49961602
  1.69223174  4.19981165 12.49173066 16.32761904]
				
			

Example: Draw a histogram with the given data set.

				
					import numpy as np

import matplotlib.pyplot as plt

data_set = np.random.uniform(1, 20, 1000)

plt.hist(data_set, 250)

plt.show()	

				
			
machine-learning-data-distribution

If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.

Connect on:

Recent Post

Popular Post

Top Articles

Archives
Categories

Share on