Caveats are a warning of a specific stipulation, condition or limitation.
Example: Dying ReLU, Vanishing Gradient, Hard to break symmetry & Vanilla SGD.
Dying ReLU: Most of the things depend on how we initialize the weight or if the neuron is not initialized correctly then it may not fire, suppose if we receive a negative input value then the ReLU will give no output so derivative will be zero, hence major part of network could be dead ReLU and doing nothing in the learning algorithm.
Vanishing Gradient problem: It occurs when the slope becomes too small and decreases, the backpropagation typically decreases exponentially, at that time network no longer in the position to learn more. This occurs when the network parameters are not properly set.
A decade back, vanishing gradient problem was actually a major problem while training a Deep Neural Network model using Gradient-based optimization techniques like Sigmoid (vanishing gradient, none zero centered) & Tanh function (vanishing gradient).
Zero centered means: no input, no output
non-zero centered means: no input, nevertheless still output (0.5)
In Sigmoid function, lots of time we have slightly –ve & +ve values and we get very flat gradient descent, for instance, if we have lots of numbers which is closed to zero and if we multiply all together to update the ‘weight’ is going to be very tiny, and backpropagation algorithm will stop the learning.
So in ReLU y = 0 for all negative numbers and for positive y = x, Hence nowadays we use ReLU activation functions (non-zero centered) in training a Deep Neural Network Model to avoid such problem.
Example: Dying ReLU, Vanishing Gradient, Hard to break symmetry & Vanilla SGD.
Dying ReLU: Most of the things depend on how we initialize the weight or if the neuron is not initialized correctly then it may not fire, suppose if we receive a negative input value then the ReLU will give no output so derivative will be zero, hence major part of network could be dead ReLU and doing nothing in the learning algorithm.
Vanishing Gradient problem: It occurs when the slope becomes too small and decreases, the backpropagation typically decreases exponentially, at that time network no longer in the position to learn more. This occurs when the network parameters are not properly set.
A decade back, vanishing gradient problem was actually a major problem while training a Deep Neural Network model using Gradient-based optimization techniques like Sigmoid (vanishing gradient, none zero centered) & Tanh function (vanishing gradient).
Zero centered means: no input, no output
non-zero centered means: no input, nevertheless still output (0.5)
In Sigmoid function, lots of time we have slightly –ve & +ve values and we get very flat gradient descent, for instance, if we have lots of numbers which is closed to zero and if we multiply all together to update the ‘weight’ is going to be very tiny, and backpropagation algorithm will stop the learning.
So in ReLU y = 0 for all negative numbers and for positive y = x, Hence nowadays we use ReLU activation functions (non-zero centered) in training a Deep Neural Network Model to avoid such problem.
0 Comments