Publication record · 18.cifr/2015.ioffe.batch-normalization
18.cifr/2015.ioffe.batch-normalizationTraining Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
The authors note that Batch Normalization's benefits for recurrent networks were not fully explored, since applying BN across time steps in RNNs is non-trivial. Extending the approach to very small mini-batches or online settings remains an open challenge that later led to Layer Normalization and Group Normalization variants.