What is feature scaling?
Remember when we used to solve maths problems in our childhood for mensuration where we had multiple units for the same quantity like oil which can take litres unit, ml units
So while solving problems we used to convert them into the same unit.
A similar concept applies to machine learning features also. It can be possible we can have multiple units and magnitudes which is varying. So the technique which helps in such conversion is known as feature scaling where we deal with their units and magnitudes.
Why is feature scaling required?
In features, we have 2 components magnitude i.e value and other is unit which gives information in which measure was measured. A dataset consists of many features with different magnitude and units.
Most of the machine learning algorithms use the distance between 2 data points for computation. If left alone, these algorithms only take in the magnitude of features neglecting the units. The results would vary greatly between different units, 5kg and 5000gms. Technically 5Kg and 5000Gms are the same. The features with high magnitudes will weigh in a lot more in the distance calculations than features with low magnitudes. This is hiding the original pattern of data. To suppress this effect, we need to bring all features to the same level of magnitudes. This can be achieved by scaling.
Which machine learning algorithms require scaling?
- KNN and KMeans:- It uses Euclidean distance hence scaling all numerical features to weigh equal.
- PCA:- PCA tries to get the features with maximum variance and the variance is high for high magnitude features. This skews the PCA towards high magnitude features.
- Gradient Descent based Algorithms:- Linear regression, Logistic regression, ANN tries to converge faster with scaled features.
- Naïve Bayes and LDA:- They internally handle the weightage of features so scaling may not have much effect.
- Tree-Based Algorithms:- DecisionTree, RandomForest, Boosting algorithms do not use distance for their computation. Hence scaling is not required.
What are different types of features scaling?
- Standardization:- It replaces values with their z scores
This redistributes the features to their mean=0 and standard deviation=1. Its python implementation is available on the sklearn library.
- Mean Normalization:- This kind of scaling brings the distribution in the range of -1 to 1 with mean=0.
Standardization and Mean Normalization is used for algorithms that assume zero centric data like PCA.
- MinMax Scaler:- This scaling technique brings the values in range 0 to 1.
When to normalize and standardize the data?
Normalization is a good technique to use when you do not know the distribution of your data or when you know the distribution is not Gaussian (a bell curve). Normalization is useful when your data has varying scales and the algorithm you are using does not make assumptions about the distribution of your data, such as k-nearest neighbours and artificial neural networks.
Standardization assumes that your data has a Gaussian (bell curve) distribution. This does not strictly have to be true, but the technique is more effective if your attribute distribution is Gaussian. Standardization is useful when your data has varying scales and the algorithm you are using does make assumptions about your data having a Gaussian distribution, such as linear regression, logistic regression, and linear discriminant analysis.