We divide the data into batches with a certain batch size and then pass it through the network. Batch normalization is applied on the neuron activation for all the samples in the mini-batch such that the mean of output lies close to 0 and the standard deviation lies close to 1. It also introduces two learning parameters gama and beta in its calculation which are all optimized during training.
Advantages of Batch Normalization Layer
Batch normalization improves the training time and accuracy of the neural network.
It decreases the effect of weight initialization.
It also adds a regularization effect on the network.
It works better with the fully Connected Neural Network (FCN) and Convolutional Neural Network.
Disadvantages of Batch Normalization Layer
Batch normalization is dependent on mini-batch size which means if the mini-batch size is small, it will have little to no effect
If there is no batch size involved, like in traditional gradient descent learning, we cannot use it at all.
Batch normalization does not work well with Recurrent Neural Networks (RNN)
This technique is not dependent on batches and the normalization is applied on the neuron for a single instance across all features. Here also mean activation remains close to 0 and mean standard deviation remains close to 1.
Advantages of Layer Normalization
It is not dependent on any batch sizes during training.
It works better with Recurrent Neural Network.
Disadvantages of Layer Normalization
It may not produce good results with Convolutional Neural Networks (CNN)
the normalization is applied on the neuron for a single instance across all features.