Decision Tree is one of the easiest and popular supervised learning algorithms to understand and interpret. It is a nonparametric model i.e. no assumptions are made while building decision trees. Decision Tree algorithm falls under supervised learning algorithms and can be used for solving both **regression and classification **problems.

Table of Contents

- Decision Tree Example
- Decision Tree Terminologies
- Types of Decision Trees
- How Decision Trees Work?
- Splitting Criteria
- Gini Index
- Information Gain

- Python Code – Decision Tree Visualization
- Video

## 1. Decision Tree Example

Let’s first understand the output of Decision Tree using a simple example and see why it is very easy to interpret.

Assume that you want to Order Food, how do you decide whether we will order or not? Supposing you have enough money for order :p. So, you generally check **two factors**: a) whether you have coupon or not and b) how much is the delivery time. You have already stored data for your previous decision making while ordering.

Using the stored data, let’s create a simple Decision Tree, as shown below:

Let’s understand how the prediction works. You have provided two input features to the decision tree model (Delivery Time & Coupon) and taking both these factors into account, you decide if you will order food or not.

**For example,** if the delivery time is 10 mins and you have coupons. So according to the above Decision Tree flowchart, you are likely to order food.

## 2. Decision Tree Terminologies

Let’s get familiar with some of the Decision Tree terminologies:

The name Decision Tree itself suggests that it uses a flowchart like a tree structure and show decisions or predictions to be made using a series of feature-based splits.

**Root Node:**Root Node is present at the beginning of a decision tree. It represents the entire dataset which further gets divided according to various features.**Leaf Node:**Leaf nodes (or Terminal Nodes) are the final output node, and further splitting of the tree is not possible.**Splitting:**Splitting is a process of dividing a node into two or more sub-nodes according to given conditions.**Branch (Sub-Tree):**A tree or sub-section formed by splitting the tree.**Decision Node:**When a node splits into further sub-nodes, it is called decision node.**Pruning:**Pruning is the process of removing the unwanted branches (sub-trees) from the tree to prevent overfitting. It is the opposite process of splitting.**Parent and Child Node:**A node, which is divided into sub-nodes is called parent node of sub-nodes and sub-nodes are the child of parent node. e.g. root node is parent node, and sub-nodes will be child nodes.

## 3. Types of Decision Trees

The Decision Tree algorithm selection is generally based on the type of target variables. Let us look at some algorithms used in Decision Trees:

**CART**(Classification And Regression Tree)**ID3**(extension of D3)**C4.5**(successor of ID3)**CHAID**(Chi-square Automatic Interaction Detection)**MARS**(Multivariate Adaptive Regression Splines)

For this article, we will consider CART Decision Tree which used Gini Index as splitting criteria.

## 4. How Decision Trees Work?

Decision trees are upside down. The root is at the top and then this root is split into various several nodes. In layman terms, Decision Trees asks bunch ofTrue or Falsequestions at each node. In response to these questions, we split the data into two or more subsets. Thegoal of the question is to un-mix the data or reduce uncertainty at each node.We will repeat this process till there are no further questions to ask or data is in purest form (i.e. leaf node) or some condition is met.

### Decision Tree CART Algorithm Steps:

- Tree begins at the root node (say S), which contains the complete dataset.
- Find the best Question to split the data at node, which helps in increase the purity (reduce the uncertainty) or un-mix the data.
- The purity or un-mixing is calculated
**Gini Impurity (GI)**and**Information gain(IG)**. - Selects the Question which has the smallest Gini Impurity or Largest Information gain.
- The set S is then split by the selected Question to produce a subset of the data.
- Recursively (Repeatedly) make new decision trees using the subsets of the data created in Step-5. The algorithm continues to recur on each subset, considering only Questions never selected before.
- Continue this process until a stage is reached where we cannot further split the nodes (called leaf node) or some condition is met.

**Now, few question arises in our mind:**

- What type of Questions can we ask?
- At each node, we pass a list of features (independent variables). So every value of every feature can be asked as a True or False statement in a question.

- How to decide which Question to ask when?
- Our goal is to ask question which helps in increasing the purity or un-mixing of the data. We use
**splitting criteria**like Entropy, Gini index or Information Gain.

- Our goal is to ask question which helps in increasing the purity or un-mixing of the data. We use

## 5. Splitting Criteria

How to decide which Question to ask when or at which node is a complicated step. By just randomly selecting any question at any node to be the root can’t solve the issue. For solving this problem, we can use some **splitting criteria** like :

**Gini Index****Entropy****Information Gain****Gain Ratio****Reduction in Variance****Chi-Square**

The best Question is the one that reduces the uncertainty the most. Splitting criteria likeGini index let us quantify how much uncertainty is at a node. Information Gain let us quantify how much a Question reduces that uncertainty.

Let’s first understand what is Gini Index and Information Gain and How to calculate them?

### Gini Index

The Gini index (Gini impurity) is a measurement of the likelihood of **incorrect classification** of a new instance of a random variable. If that new instance were randomly classified according to the distribution of class labels from the given dataset.

**The lower the Gini index the better, higher value of Gini index implies higher inequality (mixing)**.

**Gini Impurity Formula:**

where p is the probability of a certain classification i

Gini Index works with the categorical target (dependent) variable. It performs only Binary splits.

**Steps to Calculate Gini index for a Split:**

- Calculate Gini for sub-nodes, using the above formula for 1 – Σ (p
_{i})². - Calculate the Gini index for split using the weighted Gini score of each node of that split.

CART (Classification and Regression Tree) uses the Gini index method to create split points.

Let’s take above example,

We can ask question using independent variables (features). In this data set we have 2 features Coupon (categorical variable) and Delivery Time (numerical variable). Let’s use both of these features to ask questions, create a split and measure Gini Impurity.

### Information gain

Information gain (or **IG**) is a statistical property that measures how well a given condition separates the training set examples according to their target classification. Constructing a decision tree is all about **finding an attribute that returns the highest information gain and the smallest entropy.**

Information gain is a decrease in entropy. It computes the difference between entropy before split and average entropy after split of the dataset based on given attribute values. ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain.

In the above image, we can check that Delivery Time is giving greater reduction in Gini Impurity or more Information Gain as compared to Coupon at root node:

- Information Gain (Delivery Time) = 0.18
- Information Gain (Coupon) = 0.01

So, Decision Tree algorithm will select Delivery Time as splitting criteria. This step is used Recursively (Repeatedly) to make new decision trees using the subsets of the data. Until a stage is reached where we cannot further split the nodes (called leaf node) or some condition is met.

## 6. Code

Let’s solve above example in python code by training and visualizing decision tree.

**Step 1: Objective**

Assume we want to Order Food, how can we decide whether we will order or not based on our past decision making? So, we generally check two factors where we have coupon or not and how much is the delivery time.

**Step 2: Data Collection**

```
# import libraries
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
```

```
# create dataset
data = {'Coupon':[0, 0, 1, 1, 1],
'DeliveryTime':[15, 15, 20, 25, 30],
'Order':[1, 0, 1, 1, 0]
}
# create DataFrame
df = pd.DataFrame(data)
df.head()
```

Coupon | Delivery Time | Order Food |

0 | 15 | 1 |

0 | 15 | 0 |

1 | 20 | 1 |

1 | 25 | 1 |

1 | 30 | 0 |

**Step 3: Data Preparation**

```
# create X (independent variables) and y (target variable)
X=df.drop(["Order"],axis=1)
y=df["Order"]
```

**Step 4: Modeling**

```
# create dt object for decision tree algorithm
dt = DecisionTreeClassifier()
# fit the model
dt.fit(X, y)
```

Decision Tree Classifier scikit-learn Documentation.

**Step 5: Decision Tree Visualization**

```
# plot trained decision tree using sklearn.tree package
plt.figure(figsize=(8,8))
tree.plot_tree(dt,feature_names=X.columns,class_names=['No','Yes'],filled=True)
```

## 7. Video

Learn more Data Science Algorithms **here**.