Publication record · 18.cifr/2020.huang.resnet-ntk

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective

v1.0.0

Kaixuan Huang (Georgia Institute of Technology), Yuqing Wang (Georgia Institute of Technology), Molei Tao (Georgia Institute of Technology), Tuo Zhao (Georgia Institute of Technology)

RAI18.cifr/2020.huang.resnet-ntk

NeurIPS 2020· 2020· doi:10.48550/arXiv.2002.06262

Deep residual networks (ResNets) have demonstrated better generalization performance than deep feedforward networks (FFNets). However, the theory behind such a phenomenon is still largely unknown. This paper studies this fundamental problem in deep learning from a so-called neural tangent kernel perspective. Specifically, we first show that under proper conditions, as the width goes to infinity, training deep ResNets can be viewed as learning reproducing kernel functions with some kernel function. We then compare the kernel of deep ResNets with that of deep FFNets and discover that the class of functions induced by the kernel of FFNets is asymptotically not learnable, as the depth goes to infinity. In contrast, the class of functions induced by the kernel of ResNets does not exhibit such degeneracy. Our discovery partially justifies the advantages of deep ResNets over deep FFNets in generalization abilities. Numerical results are provided to support our claim.

neural tangent kernelresidual networksfeedforward networksgeneralizationdeep learning theorykernel degeneracy

✦ Research context

What this agent contributes to the literature.

Problem solved

Before this paper, there was no theoretical explanation for why ResNets consistently outperform FFNets in generalization at large depth. The NTK framework existed but had not been used to compare degeneracy across architectures as depth scales. This work fills that gap with a concrete mathematical criterion: eigenvalue collapse of the kernel Gram matrix.

Novelty

The paper proves that the NTK of deep FFNets degenerates as depth grows — the induced RKHS becomes asymptotically unlearnable — while the ResNet NTK remains non-degenerate due to skip connections. This is the first rigorous NTK-based theoretical justification for ResNet's superior generalization. The result directly ties architecture (skip connections) to preserved kernel expressivity in the infinite-width, infinite-depth limit.

Related research

Computing related research...

Canvas contract1-in / 1-out · unpacked into arch_params, eval_points legacy ports

Sample data

Loading sample data...

Total calls

This month

Citations

Last called

—

Image digest

sha256:4f6ef4449f53bd367cbd03e13c2316c74ddfc77bd1702d75b75244f33f8f41b2

Invoke command

python main.py

Inputs

input:application/json

Outputs

output:application/json

Citation

Loading DOI…

Invoke

CPU compute only

How to get GPU access: Your university, lab, or company can become a CIFR institutional member. Members get GPU-accelerated runs for all their researchers. Contact us

Pre-filled with the paper's canonical scenario. Click Invoke agent to reproduce the original result, or edit the JSON below to run a counterfactual.

inputapplication/jsonoptional

Unified canvas input with architecture params and evaluation points

Leave empty to run the paper's canonical scenario.

Recent invocations(0)

No invocations yet — be the first to call this agent.