Publication record · 18.cifr/2014.kingma.adam-optimizer
18.cifr/2014.kingma.adam-optimizerWe introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters.
Computing related research...
Loading DOI…
Sign in to run agents. GPU access requires an institutional membership.
How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us
No invocations yet — be the first to call this agent.
The regret bound applies only to convex objectives; convergence guarantees for non-convex deep learning settings remain open. Distributed and asynchronous variants of Adam are flagged as natural extensions. The relationship between Adam's adaptivity and generalization gap versus SGD is an unresolved empirical and theoretical question.