1引言

深度学习最好的入门方法就是看代码J。

https://github.com/fchollet/deep-learning-models

从VGG、Inception、Inception V3、ResNet到Xception，代码看一遍。只能感叹一句，We know nothing about deep learning。VGG中的motivation是观察到使用1×1、3×3的小卷积以及深度网络可以提高图像分类效果；Inception说在保持深度的同时还要尽量减少计算量，因此提出了一种Inception module减少维度；Xception使用了一个depth separable convolution结构；而He观察到虽然Batch normalization可以避免数据分布的能量消失，但是网络的训练误差却无法随着深度增加而下降，因此提出ResNet结构，使得更深的网络能够训练。

感觉深度学习像是玄学一般，让人觉得忽然明白了什么，但是又好像没法完全理解为什么会这样。这些方法简单到在Keras中可以用几十行代码实现，但是却十分有效果。

2 模型

2.1 VGG模型

VGG模型是大量的小卷积以及池化层的堆叠，最后是2个全连接层以及1个softmax分类层。

原论文中有几种配置：

在Keras中提供VGG16、VGG19的模型参数。当时这个19层的网络已经是很深的网络了，因此叫做Very deep CNN。为了训练这个网络，他们先训练小网络然后再迁移到大网络上。不过现在通过好的初始化方法已经不再需要这样的预训练了。

2.2 Inception到Xception

Inception模块是一些1×1,3×3,5×5的小卷积的组合。网络由卷积、pooling、Inception模块堆叠而成。顶层的全连接层也被删除了。

在V3中又把结构中的5×5改为了两个3×3串联，进一步减少参数量。

并且引入了额外的分类器辅助训练。

Inception V3总共有3种Inception结构。第一层的7×7的卷积也被换成了3×3。

所以Inception的思想基本上就是Network in Network。

Xception又进一步加入了depth separable convolution结构，引入了ResNet连接。

2.3 ResNet

He观察到虽然Batch normalization可以避免数据分布的能量消失，但是网络的训练误差却无法随着深度增加而下降，因此提出了残差网络（Residual Network）。由于采用了y=x+f(x)的形式，所以确保了不会出现梯度消失。在原先的版本是先做映射，然后接activation，导致信号在经过BN以后马上与另一个信号相加，导致分布不再是0均值1方差；而改进的版本中使用的是先activation，再做映射，避免了这个问题。

3 分类结果

使用发布的网络权重对一个图片进行分类。

分类结果为：

模型	结果
VGG16	[[(‘pizza’, 0.46461356), (‘toilet_seat’, 0.13691212), (‘frying_pan’, 0.028918281), (‘barrel’, 0.027271563), (‘corn’, 0.026535364)]]
InceptionV3	[[(‘pizza’, 0.74566227)]]
Xception	[[(‘pizza’, 0.86967695), (‘caldron’, 0.027229017), (‘Crock_Pot’, 0.0048811049), (‘frying_pan’, 0.0034465457), (‘Dutch_oven’, 0.0033302067)]]
ResNet	[[‘pizza’, 0.98416579), (‘toilet_seat’, 0.0087018134), (‘spaghetti_squash’, 0.0023796151), (‘frying_pan’, 0.00098949193), (‘meat_loaf’, 0.00086765102)]]

又测试一张图片，结果真是Impressive。orz

模型	结果
inception	web_site(100% confidence!!)
vgg16	wig(假发)
vgg19	wig（假发）
xception	ping-pong_ball
ResNet	Band_Aid

参考文献

[1] VGG：Simonyan, Karen, and Andrew Zisserman. “Very deep convolutional networks for large-scale image recognition.”.?arXiv preprint arXiv:1409.1556?(2014).

[2] Inception: Szegedy, Christian, et al. “Going deeper with convolutions.”?Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

[3] Inception V3: Szegedy, Christian, et al. “Rethinking the inception architecture for computer vision.”?Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

[4] Xception: Chollet, Franois. “Xception: Deep Learning with Depthwise Separable Convolutions.”?arXiv preprint arXiv:1610.02357.?(2016).

[5] ResNet: He, Kaiming, et al. “Deep residual learning for image recognition.”?Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

[6] ResNet V2: He, Kaiming, et al. “Identity mappings in deep residual networks.”?European Conference on Computer Vision. Springer International Publishing, 2016.

哈克查

机器学习; 自然语言处理(NLP); 数据挖掘; 人工智能; 算法

图像分类经典深度网络模型

1引言

2 模型

2.1 VGG模型

2.2 Inception到Xception

2.3 ResNet

3 分类结果

参考文献

发表评论取消回复

1引言

2 模型

2.1 VGG模型

2.2 Inception到Xception

2.3 ResNet

3 分类结果

参考文献

发表评论 取消回复

发表评论取消回复