Introduction to Machine Learning Decision Tree

In This Article, You Will Learn About Machine Learning Decision Tree.

Machine Learning Decision Tree – Before moving ahead, let’s take a look at Introduction to Machine Learning Train/Test.

Table of Contents

Decision Tree

A decision tree is a process of making decisions on based on some previous information.

How Does It Work?

Example: Read the file and print the information.

				
					import pandas as pd  

info = pd.read_csv("DATA.csv") #read file

print(info)

				
			

Click to Download DATA.csv file.

As a result, it read the dataset.

Note: All values in the data set should have a numerical value. 

Data is not in numerical value; therefore, convert data to numerical value before making a decision tree.

To convert the dataset into a numerical value, use the Pandas’ map() function that takes dictionary (Pair > key:value) as a parameter.

For example:  Let’s take a look at “DATA.csv” file.

{‘I’: 0, ‘II’: 1, ‘III’: 2}

Means convert the values ‘I’ to 0, ‘II’ to 1, and ‘III’ to 2.

Example: Change the “Class” and “Present” columns’ values to numerical values.

				
					import pandas as pd 

info = pd.read_csv("DATA.csv")

# convert dataset to the numerical value
Dictionary = {'A': 0, 'B': 1, 'C': 2, 'D': 3,
              'E': 4, 'F': 5, 'G': 6, 'H': 7, 'I': 8, 'J': 9, 'K': 10, 'L': 11}
info['name'] = info['name'].map(Dictionary)

Dictionary = {'I': 0, 'II': 1, 'III': 2, 'IV': 3, 'V': 4, 'VI': 5, 'VII': 6,
              'VIII': 7, 'IX': 8, 'X': 9, 'XI': 10, 'XII': 11}
info['class'] = info['class'].map(Dictionary)

Dictionary = {'Y': 0, 'N': 1, 'Y': 2, 'N': 3, 'Y': 4,
              'N': 5, 'Y': 6, 'N': 7, 'Y': 8, 'N': 9, 'Y': 10, 'N': 11}
info['present'] = info['present'].map(Dictionary)

print(info)


				
			

As a result, it returned two columns after changing the dataset into numerical values.

Now we have to separate the feature columns from the target column.

The columns that are called feature are those we attempt to forecast from. The targeted column is the column that contains the data we try to determine.

Example: Separate the columns as “features” and “target.”

				
					x = feature columns; y = target column.
features = ['name', 'class', 'rollno', 'marks']

x = info[features]
y = info['present']

print(x)
print(y)

				
			

Note: Add the code mentioned above except “print(info)” before this code and then run the code.

Now, we are ready to make a decision tree.

Example: Create a Decision Tree, save it as an image, and show it.

				
					# import necessary module

import matplotlib.image as pltimg
import matplotlib.pyplot as plt
import pandas as pd
import pydotplus
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier

info = pd.read_csv("DATA.csv")

# convert dataset to the numerical value
Dictionary = {'A': 0, 'B': 1, 'C': 2, 'D': 3,
              'E': 4, 'F': 5, 'G': 6, 'H': 7, 'I': 8, 'J': 9, 'K': 10, 'L': 11}

info['name'] = info['name'].map(Dictionary)

Dictionary = {'I': 0, 'II': 1, 'III': 2, 'IV': 3, 'V': 4, 'VI': 5, 'VII': 6,
              'VIII': 7, 'IX': 8, 'X': 9, 'XI': 10, 'XII': 11}
info['class'] = info['class'].map(Dictionary)

Dictionary = {'Y': 0, 'N': 1, 'Y': 2, 'N': 3, 'Y': 4,
              'N': 5, 'Y': 6, 'N': 7, 'Y': 8, 'N': 9, 'Y': 10, 'N': 11}
info['present'] = info['present'].map(Dictionary)

features = ['name', 'class', 'rollno', 'marks']

x = info[features]
y = info['present']

decision_tree = DecisionTreeClassifier()
decision_tree = decision_tree.fit(x, y)
data = tree.export_graphviz(
    decision_tree, out_file=None, feature_names=features)
graph = pydotplus.graph_from_dot_data(data)
graph.write_png('first_decision_tree.png')

Image = pltimg.imread('first_decision_tree.png')
Image_plot = plt.imshow(Image)
plt.show()

				
			

Predict Values

Now we use the Decision Tree to predict new values.

Example: Use predict() method to predict new values.

				
					import pandas as pd
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier

info = pd.read_csv("DATA.csv")

# convert dataset to the numerical value
Dictionary = {'A': 0, 'B': 1, 'C': 2, 'D': 3,
              'E': 4, 'F': 5, 'G': 6, 'H': 7, 'I': 8, 'J': 9, 'K': 10, 'L': 11}

info['name'] = info['name'].map(Dictionary)

Dictionary = {'I': 0, 'II': 1, 'III': 2, 'IV': 3, 'V': 4, 'VI': 5, 'VII': 6,
              'VIII': 7, 'IX': 8, 'X': 9, 'XI': 10, 'XII': 11}
info['class'] = info['class'].map(Dictionary)

Dictionary = {'Y': 0, 'N': 1, 'Y': 2, 'N': 3, 'Y': 4,
              'N': 5, 'Y': 6, 'N': 7, 'Y': 8, 'N': 9, 'Y': 10, 'N': 11}
info['present'] = info['present'].map(Dictionary)

features = ['name', 'class', 'rollno', 'marks']

x = info[features]
y = info['present']

decision_tree = DecisionTreeClassifier()
decision_tree = decision_tree.fit(x, y)

print(decision_tree.predict([[7, 14, 45, 1]]))

print("[0] means 'Y'")
print("[1] means 'N'")


				
			

Line 1 to 3,

Imported necessary modules to predict decisions based on the dataset.

Line 5,

Used Pandas’ function read_csv to read the data file.

Line 7 to 19,

To convert non-numeric values into numeric values, used dictionary data type. Each non-numeric value is taken as Key and sequentially assigned each non-numeric value a number in the format of Key:value.

Line 21 to 24,

Specified dataset columns as features (x) and target (y).

Features – Features column is a column from which data is taken to predict.

Target – Target column is a column that will be predicted.

Line 29,

Predict whether a student will be presented or not based on the given dataset value.

Will a student be presented if his name is “G” studying in class VII with rollno 14 and marks 45?

As it is shown clearly that from dataset it returned decision tree.

If you find anything incorrect in the above-discussed topic and have any further questions, please comment below.

Connect on:

Leave a Comment

Stay in the loop

codingstreets