Object classification, image recognition
See:
# Resources
- https://cv-tricks.com/cnn/understand-resnet-alexnet-vgg-inception/
- https://blog.paralleldots.com/data-science/must-read-path-breaking-papers-about-image-classification/
# References
- #PAPER
AlexNet: ImageNet Classification with Deep Convolutional Neural Networks (2012)
- This architecture was one of the first deep networks to push ImageNet Classification accuracy by a significant stride in comparison to traditional methodologies. It is composed of 5 convolutional layers followed by 3 fully connected layers.
- AlexNet, proposed by Alex Krizhevsky, uses ReLu(Rectified Linear Unit) for the non-linear part, instead of a Tanh or Sigmoid function which was the earlier standard for traditional neural networks. Another problem that this architecture solved was reducing the over-fitting by using a Dropout layer after every FC layer.
- #PAPER Visualizing and Understanding Convolutional Networks (Zeiler and Fergus 2013)
- #PAPER
Very Deep Convolutional Networks for Large-Scale Image Recognition, VGG16 (Symonian 2014)
- This architecture is from VGG group, Oxford. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3X3 kernel-sized filters one after another.
- #PAPER
Going Deeper with Convolutions (Szegedy 2015)
- GoogLeNet (Inception-V1, 2015)
- http://nicolovaligi.com/history-inception-deep-learning-architecture.html
- GoogLeNet devised a module called inception module that approximates a sparse CNN with a normal dense construction(shown in the figure). Since only a small number of neurons are effective as mentioned earlier, the width/number of the convolutional filters of a particular kernel size is kept small. Also, it uses convolutions of different sizes to capture details at varied scales(5X5, 3X3, 1X1). Another salient point about the module is that it has a so-called bottleneck layer(1X1 convolutions in the figure). It helps in the massive reduction of the computation requirement. Another change that GoogLeNet made, was to replace the fully-connected layers at the end with a simple global average pooling which averages out the channel values across the 2D feature map, after the last convolutional layer.
- #PAPER See Resnet in Residual and dense neural networks
- #PAPER See Resnext in Residual and dense neural networks
- #PAPER See Densenet in Residual and dense neural networks
- #PAPER SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (Iandola 2016)
- #PAPER See SENets in CNNs
- #PAPER Aggregated Residual Transformations for Deep Neural Networks (Xie 2017)
- #PAPER Local Relation Networks for Image Recognition (Hu 2019)
- #PAPER Designing Network Design Spaces (Radosavovic 2020)
- #PAPER NFNets. High-Performance Large-Scale Image Recognition Without Normalization (Brock 2021)
- #PAPER Patches Are All You Need? (2021)
- #PAPER CoAtNet: Marrying Convolution and Attention for All Data Sizes (Dai 2021)