Welcome to CS With James

In this tutorial I will discuss about Vanishing Gradient Problem and how it is solved by using ReLU Activation function.

**Vanishing Gradient Problem** is the problem occur when the Network become too deep and the result Become worse then shallow Network. It is happening because of the sigmoid function

The Neural Network is trained by backward propagation, but if the Network is too deep with the sigmoid activation function then the first several layers are not get trained.

This is visualization of sigmoid function

The problem is the sigmoid function squeeze the weights on the neuron between 0 and 1, so the input is hardly affecting the output of the Network.

That is why deeper Network result out worse then the shallow Network.

This is good Visualization of the Vanishing Gradient Problem

The first few layers are faded out so it is not affecting the result of the Network

Code for Vanishing Gradient Problem

The result is 57.93% accuracy which is way worse then the Network with the 3 layers.

**Solution**

**ReLU Activation function. **ReLU Stands for Rectified Linear Unit.

This is the visualization of the ReLU Activation Function.

Code for Deep Neural Network with ReLU Activation Function

The result gets much better

94.61% of accuracy which is not perfect but with the ReLU Activation Function the Deep Neural Network works.