What is L2 regularization or L2 Normalization or Ridge?
Any neural network with large weight is a sign of an unstable network where applying minor changes to the input will result in huge changes in output and leads to complexity and over-fitting in the model. Hence Regularization technique is to discourage this problem by adding a penalty to the loss function and makes the weight small.
Error Function (E), Mean Square Error which is:
Error Function (E), Mean Square Error which is:
L2 regularization or Ridge = Error + Penalty
Weight decay is a regularisation technique in which each weight is multiplied by a number which is slightly less than 1 (like 0.95 etc). This stops the weights from growing larger at every epoch.
What is L1 regularization or L1 Normalization or Lasso?
It is similar to the L2 regularization but with only one difference that we penalize absolute value of weight |w|. Here we keep the parameter near to zero so most of the weights are zero and very few are non-zero
Difference between L2 regularisation (Ridge) and L1 regularization (Lasso)
Weight:
Ridge penalizes the sum of square weights (∑w2)
Lasso penalizes the sum of the absolute value of weights (∑|w|).
Solution:
Ridge has one solution.
Lasso has many solutions,
Feature Selection:
Ridge has no feature selection
Lasso has built-in feature selection,
Complex Learning
Ridge is able to learn complex data patterns
Lasso cannot learn complex patterns
Robustness:
Ridge is not robust to outliers, as squaring the weight removes the error differences
Lasso is robust to outliers
Prediction:
Ridge gives better prediction
Lasso is not good at prediction.
Weight:
Ridge penalizes the sum of square weights (∑w2)
Lasso penalizes the sum of the absolute value of weights (∑|w|).
Solution:
Ridge has one solution.
Lasso has many solutions,
Feature Selection:
Ridge has no feature selection
Lasso has built-in feature selection,
Complex Learning
Ridge is able to learn complex data patterns
Lasso cannot learn complex patterns
Robustness:
Ridge is not robust to outliers, as squaring the weight removes the error differences
Lasso is robust to outliers
Prediction:
Ridge gives better prediction
Lasso is not good at prediction.
What is Elastic Regularization?
Elastic regularization is combing both Ridge and Lasso together.
0 Comments