Deep Residual Learning for Image Recognition
I am reading this paper because it was recommended as part of Ilya Sutskever's approx. 30 papers that he recommended to John Carmack to learn what really matters for machine learning / AI today. This paper "presents a residual training framework to ease the training of networks that are substantially deeper than those used previously".
Reference Link to PDF of Paper
0.1 Abstract
Deep Neural Networks are more difficult to train. This paper presents a residual learning framework to ease the training of networks that are substantially deeper than those used previously. The paper reformulates layers as leaning residual functions with reference to the layer inputs, instead of learning unreferenced functions. This paper provides comprehensive empirical evidence that these residual networks are easier to optimize and can gain accuracy from considerably increased depth. The depth of representations is of central importance for many visual recognition tasks.
Deep convolutional neural networks have led to a series of breakthroughs for image classification. Deep networks naturally integrate low/mid/high level features and classifiers in an end-to-end multilayer fashion and the ”levels” of features can be enriched by the number of stacked layers. Depth of the network is of central importance in difficult tasks. The depth of networks was constrained by the exploding / vanishing gradients problem. This problem has been largely addressed ny normalized initialization and intermediate normalization layers, which enable networks with tens or layers to start converging for stochastic gradient descent (SGD) with backpropagation. When deeper networks are able to start converging, a *degradation* problem has been exposed: with the network depth increasing, accuracy gets saturated and then degrades rapidly - not caused by overfitting. This paper addresses the degradation problem by introducing a deep residual learning framework. Instead of hoping each few stacked layers directly fit a desired underlying mapping, we explicitly let these layers fit a residual mapping. This paper hypothesize that it is easier to optimize the residual mapping than to optimize the original, unreferenced mapping. The formulation of can be realized by feedforward neural networks with ”shortcut connections”. Shortcut connections are those skipping one or more layers. Shortcut connections in this case perform identity mapping, and their outputs are added to the outputs of the stacked layers. Identity shortcut connection add neither extra parmamaters nor computational complexity. The entire network can still by trained end-to-end by SGD with backpropagation, and can be easily implemented with common libraries without modifying the solvers.
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.