Model Evaluation for Classification Algorithm

Exhaustive list of all the Model Evaluation metrics/techniques used in Classification Algorithm are covered in this article.

Table of Contents

  1. Confusion Matrix
  2. AUC-ROC
  3. Lift Chart
  4. Gain Chart
  5. KS Statistic
  6. F1 Score

1. Confusion Matrix

Confusion Matrix

A confusion matrix provides a easy summary of the predictive results in a classification problem. Correct and incorrect predictions are summarized in a table with their values and broken down by each class. Article explains confusion matrix, precision, recall, F1 score, specificity, etc. using examples, and code.

Single accuracy value in classification in case of the imbalanced classes provide unreliable results.

For example, we have a dataset of 100 patients in which 5 have diabetes and 95 are healthy. However, even if our model only predicts the majority class i.e. all 100 people are healthy, classification accuracy is 95%, which giving completely wrong idea. Therefore, we need a confusion matrix.

# python script for confusion matrix creation
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
actual = ['dog','cat', 'dog', 'cat', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat']
predicted = ['dog', 'dog', 'dog', 'cat', 'dog', 'dog', 'cat', 'cat', 'cat', 'cat']
results = confusion_matrix(actual, predicted)
print ('Confusion Matrix :')
print(results)
print ('Accuracy Score :',accuracy_score(actual, predicted))
print('Classification Report : ')
print (classification_report(actual, predicted))

2. AUC-ROC

ROC or Receiver Operator Characteristic curve is a plot of the True Positive Rate Recall or Recall (y-axis) and False Positive Rate (x-axis) for every possible classification threshold.

ROC curve is gives more information than a confusion matrix, as it visualizes all possible classification thresholds, whereas a confusion matrix is created only for a single threshold.

Receiver Operator Characteristic (AUC-ROC)

3. Lift Chart

Gain and Lift chart are based on the ordering of the probabilities based on decreasing order. Steps to create a Lift/Gain chart:

  • Step 1: Predict or calculate probability for each observation
  • Step 2: Rank these calculated probabilities in a decreasing order
  • Step 3: Using output of Step 2, build deciles by dividing observations into 10 equal parts.
  • Step 4: Calculate metrics such as the response rate, good observations, bad observations etc.
Decile Table python code
Decile Table
# REPRODUCABLE EXAMPLE
# Load Dataset and train-test split
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,random_state=3)
clf = tree.DecisionTreeClassifier(max_depth=1,random_state=3)
clf = clf.fit(X_train, y_train)
y_prob = clf.predict_proba(X_test)

# install kds library
pip install kds

# The magic happens here
import kds
kds.metrics.plot_lift(y_test, y_prob[:,1])
Lift Chart plot python
Lift Chart Code

4.  Gain Chart

# REPRODUCABLE EXAMPLE
# Load Dataset and train-test split
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,random_state=3)
clf = tree.DecisionTreeClassifier(max_depth=1,random_state=3)
clf = clf.fit(X_train, y_train)
y_prob = clf.predict_proba(X_test)

# install kds library
pip install kds

# The magic happens here
import kds
kds.metrics.plot_cumulative_gain(y_test, y_prob[:,1])
cumulative Gain Chart plot python
Gain Chart Code

5. Kolomogorov Smirnov – KS Statistic

Kolomogorov Smirnov Plot measures the degree of separation between the good and bad observations. The KS Statistic value of a classification model ranges between 0 and 100. The higher the value the better the model is at separating the positive from negative observations.

# REPRODUCABLE EXAMPLE
# Load Dataset and train-test split
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn import tree

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,random_state=3)
clf = tree.DecisionTreeClassifier(max_depth=1,random_state=3)
clf = clf.fit(X_train, y_train)
y_prob = clf.predict_proba(X_test)

# install kds library
pip install kds

# The magic happens here
import kds
kds.metrics.plot_ks_statistic(y_test, y_prob[:,1])
Kolomogorov Smirnov - KS Statistic python
KS Statistic Code

6. F1 Score

F1 Score is the Harmonic Mean of Precision and Recall. It is basically used to compare two models with different Precision and Recall. So to make them comparable, we use F-Score. Harmonic Mean punishes the extreme values more, as compared to Arithmetic Mean. The higher the F-score, the better is the model.

f1 score
F1 Score

Leave a Comment

Keytodatascience Logo

Connect

Subscribe

Join our email list to receive the latest updates.

© 2022 KeyToDataScience