Deep Learning Architectures

 Deep learning architectures, which have become more competitive with the ImageNet competition, which is a worldwide competition, are increasing more and more with the changing number of layers and success rates every year. It would not be wrong to say that the AlexNet architecture started this race. For this reason, the sample layer description below is made on the AlexNet architecture, and a general explanation is made for other architectures.

AlexNet: Deep convolutional neural network for image classification, winner of the ILSVRC-2012 competition. It consists of eight layers, the first five of which are convolutional and the last three are fully connected layers. Among these layers there are also “pooling” and “activation” layers. There are also input and output layers. The AlexNet architecture is designed to classify 1000 objects and the error rate in object identification has been reduced from 26.2% to 15.3%. The AlexNet architecture is shown in the figure [1].


AlexNet architecture

In the AlexNet diagram, it is seen that the problem is divided into two parts, with half running on GPU1 and the other half running on GPU2. Thus, the communication load is kept low, which helps to achieve a good overall performance. Data processing from the two channels is only crossed at the third feature extraction layer.

AlexNet layers a) 1st layer b) 2nd layer c) 6th layer

The first layer is the convolutional layer.

Layer Output

The second layer is a Max Pooling layer followed by convolution.

Layer Output

The third, fourth and fifth layers proceed similarly. The sixth layer is the fully bonded layer.

On the sixth layer, the input is transformed to a vector 13 x 13 x 128 and multiplied by 2048:

(13 x 13 x 128) x 2048

Here GEMV (General Matrix Vector Multiply) is used.

Vector X = 1 x (13x13x128)

Matrix A = (13x13x128) x 2048

Output: 1 x 2048

The seventh and eighth layers proceed similarly.

ZFNet: After AlexNet won the ImageNet competition, ZFNet [2], inspired by this architecture, became the winner of the ImageNet competition in 2013. With this architecture, the error rate in object recognition has been reduced to 11.2%. Difference from AlexNet; determines the filter size as 7x7 and the number of steps as two. Here, a smaller filter size in the first convolution layer is intended to help preserve a lot of original pixel information in the input size. In addition, it has used “Cross Entropy”, “Probabilistic Slope Descent” and “ReLU” algorithms in its architecture. ZFNet architecture consists of 7 layers. The figure shows ZFNet architecture.

ZFNet architecture

GoogLeNet: GoogleNet is a complex structure created from Inception modules and is the 2014 winner of the ImageNet competition. Unlike previous studies, the depth and width of the network prepared were increased while the calculation cost was kept low. Architecture consists of 22 layers. To optimize quality, architectural decisions are based on the Hebbian principle and the intuition of multi-scale processing. In the competition, he achieved a top-5 error rate of 6.67%. The GoogLeNet architecture is shown in the figure [3].

GoogLeNet architecture

RestNet: ResNet, which consists of 152 layers, has a deeper structure than previous architectures. It was the winner of the ImageNet competition in 2015 by achieving a 3.57% top-5 error rate. This rate exceeds the human error rate, meaning a great success. In the Residual blocks that make up the architecture, the x input produces an F (x) result after the convolution-ReLUconvolution series. This result is then added to the original entry x and expressed as H (x) = F (x) + x. An example ResNet architecture with 34 layers and Residual block structure is shown in Figure [4].

a) RestNet 34-layer architecture b) Residual block

VGG16 — VGG19: There are two different types of VGGNet architecture, 16 and 19 layers; VGG16, VGG19. The number of layers is determined by the number of weight layers. VGG16 architecture is an architecture that consists of 13 convolution 3 fully connected layers used to achieve better results in ImageNet 2014 competition [5]. There are 41 layers in total, including MaxPooling, FullyConnectedLayer, ReLULayer, DropOutLayer and SoftmaxLayer layers. The image to be included in the input layer is 224x224x3. The last layer is the classification layer [6].

The VGGNet architecture uses 3x3 filters on all its layers and overlaps Convolution-ReLU layers before pooling layer. As in other deep architectures, the height and width dimensions of the matrices from the input layer to the exit decrease while the depth value increases in VGG architecture. In 2014, it achieved a top-5 error rate of 7.3%. VGGNet architecture is shown in the figure [5].


VGGNet architecture



REFERENCES

[1] Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). 25th International Conference on Neural Information Processing Systems. ImageNet Classification with Deep Convolutional, 1097–1105. Lake Tahoe, Nevada: NIPS’12 Proceedings.

[2] Zeiler, M. D., and Fergus, R. (2014). Visualizing and Understanding Convolutional Networks. Computer Vision — ECCV 2014, 818–833. doi: 10.1007/978–3–319–10590–1_53

[3] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., and Rabinovich, A. (2015). 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Going deeper with convolutions, 1–9. Boston, MA, USA: IEEE. doi:10.1109/CVPR.2015.7298594

[4] He, K., Zhang, X., Ren, S., and Sun, J. (2016). 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Deep Residual Learning for Image Recognition,1–12. Las Vegas, NV, USA: IEEE. doi: 10.1109/CVPR.2016.90

[5] Simonyan, K., and Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. Web: https://arxiv.org/abs/1409.1556

[6] Doğan, F., ve Türkoğlu, İ. (2018). Derin Öğrenme Algoritmalarının Yaprak Sınıflandırma Başarımlarının Karşılaştırılması. Sakarya Universıty Journal Of Computer And Informatıon Scıences, 1, 10–21.

[7] Savaş, S. (2019), Karotis Arter Intima Media Kalınlığının Derin Öğrenme ile Sınıflandırılması, Gazi Üniversitesi Fen Bilimleri Enstitüsü Bilgisayar Mühendisliği Ana Bilim Dalı, Doktora Tezi, Ankara.


Hiç yorum yok:

Yorum Gönder