List of different Model Evaluation metrics/techniques for Clustering Algorithm covered in this article.

Table of Contents

## 1. Clustering Tendency

Hopkins test, a statistical test for spatial randomness of a variable, can be used to measure the probability of data points generated by uniform data distribution.

## 2. Number of Clusters (k)

There is no definitive answer for finding right number of cluster as it depends upon

- (a) Distribution shape
- (b) scale in the data set
- (c) clustering resolution required by user.

Although finding number of clusters is a very subjective problem. There are two major approaches to find optimal number of clusters:

### 2.1 Domain Knowledge

Domain knowledge plays an important role to create and select clusters or segments. General guidelines to be followed while building the store segments: (a) each segment should be greater than 5% of data, (b) total number segments or clusters should not more than 5 or 7.

### 2.2 Data driven approach:

**Empirical Method:**An empirical method of finding number of clusters is Square root of N/2, where N is total number of data points. So that each cluster contains square root of 2 * N.**Elbow Method:**The variance within a cluster is measure of compactness of the cluster. So, if the within cluster variance is lower, it means the compactness of cluster formed is higher.**Statistical Approach:**We can use a statistical method named gap statistic to find the optimal number of clusters, represented by k.

## 3. Clustering Quality

Ideally the clustering quality is often characterized by minimal intra-cluster (within) distance and maximal inter-cluster distance (between two clusters).

Two types of measures to assess the clustering quality or performance:

**Extrinsic Measures:**Require ground truth labels. Examples are Adjusted Rand index, Fowlkes-Mallows scores, Mutual information-based scores, Homogeneity, Completeness and V-measure.**Intrinsic Measures:**Does not require ground truth labels. Some of the clustering performance measures are Silhouette Coefficient, Calinski-Harabasz Index, Davies-Bouldin Index etc.