Table of Contents
1. Filter methods
Filter methods are generally the first step in any feature selection pipeline.
Filter methods can be broadly categorized into two categories: Univariate Filter Methods and Multivariate filter methods.
1.1 Univariate filter methods:
Individual features are ranked according to specific criteria. The top N features are then selected. Different types of ranking criteria are used for univariate filter methods, for example fisher score, mutual information, and variance of the feature.
Disadvantage: They may select redundant features because the relationship between individual features is not taken into account while making decisions. Univariate filter methods are ideal for removing constant and quasi-constant features from the data.
- Constant: Constant features are the type of features that contain only one value for all the outputs in the dataset. VarianceThreshold function of Python’s Scikit Learn https://stackabuse.com/applying-filter-methods-in-python-for-feature-selection/
1.2 Multivariate filter methods:
Removing redundant features from the data since they take the mutual relationship between the features into account. Multivariate filter methods can be used to remove duplicate and correlated features from the data.
- Correlation between continuous and categorical variable
2. Using Algorithms/Models
Three methods to perform feature selection using ML models:
- Shap Value
feature importance is biased; i.e. it tends to inflate the importance of continuous or high-cardinality categorical variables
Example: Random Forest Feature Importance computed in 3 Ways with Python: built-in, permutation & shap
Permutation importance: Record a baseline accuracy (classifier) or R2 score (regressor) by passing a validation set or the out-of-bag (OOB) samples through the Random Forest. Permute the column values of a single predictor feature and then pass all test samples back through the Random Forest and recompute the mechanism, but the results are more reliable.
For RF: Notice that the function does not normalize the importance values, such as dividing by the standard deviation. According to Conditional variable importance for random forests, “the raw [permutation] importance… has better statistical properties.” Those importance values will not sum up to one and it’s important to remember that we don’t care what the values are per se. What we care about is the relative predictive strengths of the features. (When using the importances() function in R, make sure to use scale=F to to prevent this normalization.)
Accuracy or R2. The importance of that feature is the difference between the baseline and the drop in overall accuracy or R2 caused by permuting the column. The permutation mechanism is much more computationally expensive than the mean decrease in impurity