*Hyperparameter tuning *is performed to choose a set of optimal hyperparameters for a learning algorithm. This is important as the performance of the entire machine learning model is based on the specified hyperparameter values and these values tune the model for the given problem.

Table of Contents

## Brief Explanation of Hyperparameter Tuning

As we turn the knobs of a Radio to get a clear signal or we turn the pegs of the guitar strings to tune it for the right pitch. Similarly, tuning hyperparameters are like the settings of an algorithm that can be adjusted to optimize performance.

H*yperparameters* must be set by the data scientist before training. Let’s consider the case of a **random forest algorithm**. Hyperparameters include:

- n_estimators = number of trees in the forest
- max_features = max number of features considered for splitting a node
- max_depth = max number of levels in each decision tree
- min_samples_split = min number of data points placed in a node before the node is split
- and many more.

RandomForestClassifier( criterion='gini', max_depth=None, max_features='auto', max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_impurity_split=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None, random_state=None)

Models can have many parameters and finding the best combination of parameters can be treated as a search problem.

## How to Tune Hyperparameter

The optimal hyperparameters are kind of impossible to determine ahead of time. Models can have many hyperparameters and finding the best combination of values can be treated as a search problem.

Thus the best method to determine the optimal hyperparameters is to try many different combinations and evaluate the performance of each model. However, choosing hyperparameter and evaluating each model only on the training set can lead to * Overfitting*.

## Why Cross Validation is Required While Tuning Hyperparameter

Optimizing the model only considering the training data will perform well on the training set, but will not be able to perform well on test data i.e. unable to generalize on new data. So, when a model performs highly on the training set but poorly on the test set (unknown set) is known as Overfitting.

Therefore, we perform * Cross Validation* which accounts for overfitting.

We will use * K-fold cross-validation* in which the dataset is partitioned into

*equal-sized*

**k***subsets*(

*or folds*). Of these many

*k*folds, a single fold is retained as the validation data for testing the model, and the remaining

*k−1*folds are used as training data. This cross-validation process is then repeated

*k*times, with each of the

*k*folds used exactly once as the validation data. At the very end of the training, we average the performance of each of the folds to produce a single estimation.

### K-fold cross-validation

The following image shows an example of 5-fold cross-validation (*k=5*). :

For example, consider fitting a model with K = 5. The first iteration we train on the last four folds and evaluate on the first. The second time we train on the first, third, fourth and fifth fold and evaluate on the second. And so on.

For hyperparameter tuning, we perform k iterations of the entire K-Fold CV process, each time using different model settings (different set of hyperparameters). Then we compare all of the models, select the best one, train it on the full training set, and then evaluate it on the testing set. So, each time we want to assess a different set of hyperparameters, we have to split our training data into K fold, train it and evaluate K times. Let’s calculate, e.g. we have *10* sets of hyperparameters and using *5-Fold CV*, which represents *50* training loops. Fortunately, this all has been implemented in Scikit-Learn library.

## Grid Search with Cross-Validation

Grid search methodically builds and evaluates a model for each combination of the algorithm parameters specified in a grid. Now we can explicitly specify every combination of settings to try in a gird.

from sklearn.model_selection import GridSearchCV #Create combination of parameter for grid search parameters_grid = { 'max_depth': [50, 70, 90, 110], 'min_samples_leaf': [3, 4, 5], 'min_samples_split': [8, 10, 12], 'n_estimators': [100, 200, 300, 600] }

A total number of combinations for the set of selected parameters above is a product of options for each parameter (4 x 3 x 3 x 4 = **192**). This number also needs to be multiplied by 5 to calculate a total combination of runs, as we will be doing 5-fold cross-validation. So if we are using many parameters and lots of options, **grid search will take a long time**.

Let’s set up the Random Forest classifier using sklearn library.

from sklearn.ensemble import RandomForestClassifier # Create a tree based model rfc = RandomForestClassifier() # Instantiate the grid search model grid_search = GridSearchCV(estimator = rfc, param_grid = paramerters_grid, cv = 5, n_jobs = -1, verbose = 2)

Next, we will fit the model, check the best hyperparameters combination, and do prediction.

# Fit the grid search to the data grid_search.fit(train_features_X, train_labels_Y) # Show the selected parameter combination grid_search.best_params_ # Save the best parameter combination to best_grid best_grid = grid_search.best_estimator_ # Predict using best_grid Test_Prediction_Y = best_grid.predict(test_features_X)

## Summary

Hyperparameter tuning is an important step for improving algorithm performance. It tests various parameter combinations to come up with the most optimized set of parameters.

In this post, we covered hyperparameter tuning in Python using the scikit-learn library. We improved algorithm results significantly using grid search.

If you have any comments or questions feel free to leave your feedback below. Follow me up at **Medium**. You can always reach me on **LinkedIn. **

Start your Data Science Journey here.