ResNet 顯著改變了如何在深度網(wǎng)絡(luò)中參數(shù)化函數(shù)的觀點(diǎn)。DenseNet(密集卷積網(wǎng)絡(luò))在某種程度上是對(duì)此的邏輯延伸 (Huang et al. , 2017)。DenseNet 的特點(diǎn)是每一層都連接到所有前面的層的連接模式和連接操作(而不是 ResNet 中的加法運(yùn)算符)以保留和重用早期層的特征。要了解如何得出它,讓我們稍微繞道數(shù)學(xué)。
import torch from torch import nn from d2l import torch as d2l
from mxnet import init, np, npx from mxnet.gluon import nn from d2l import mxnet as d2l npx.set_np()
import jax from flax import linen as nn from jax import numpy as jnp from d2l import jax as d2l
import tensorflow as tf from d2l import tensorflow as d2l
8.7.1. 從 ResNet 到 DenseNet
回憶一下函數(shù)的泰勒展開(kāi)式。對(duì)于這一點(diǎn)x=0 它可以寫(xiě)成
(8.7.1)f(x)=f(0)+x?[f′(0)+x?[f″(0)2!+x?[f?(0)3!+…]]].
關(guān)鍵是它將函數(shù)分解為越來(lái)越高階的項(xiàng)。同樣,ResNet 將函數(shù)分解為
(8.7.2)f(x)=x+g(x).
也就是說(shuō),ResNet分解f分為一個(gè)簡(jiǎn)單的線(xiàn)性項(xiàng)和一個(gè)更復(fù)雜的非線(xiàn)性項(xiàng)。如果我們想捕獲(不一定要添加)兩個(gè)術(shù)語(yǔ)以外的信息怎么辦?一種這樣的解決方案是 DenseNet (Huang等人,2017 年)。
圖 8.7.1 ResNet(左)和 DenseNet(右)在跨層連接中的主要區(qū)別:加法的使用和連接的使用。
如圖 8.7.1所示,ResNet 和 DenseNet 的主要區(qū)別在于后者的輸出是 連接的(表示為[,]) 而不是添加。結(jié)果,我們從x在應(yīng)用越來(lái)越復(fù)雜的函數(shù)序列后,它的值:
(8.7.3)x→[x,f1(x),f2([x,f1(x)]),f3([x,f1(x),f2([x,f1(x)])]),…].
最后,將所有這些功能組合在 MLP 中,再次減少特征數(shù)量。就實(shí)現(xiàn)而言,這非常簡(jiǎn)單:我們不是添加術(shù)語(yǔ),而是將它們連接起來(lái)。DenseNet 這個(gè)名字源于變量之間的依賴(lài)圖變得非常密集這一事實(shí)。這種鏈的最后一層與前面的所有層緊密相連。密集連接如圖 8.7.2所示 。
圖 8.7.2 DenseNet 中的密集連接。注意維度如何隨著深度增加。
構(gòu)成 DenseNet 的主要組件是密集塊和 過(guò)渡層。前者定義輸入和輸出如何連接,而后者控制通道的數(shù)量,使其不會(huì)太大,因?yàn)閿U(kuò)展 x→[x,f1(x),f2([x,f1(x)]),…] 可以是相當(dāng)高維的。
8.7.2. 密集塊
DenseNet 使用改進(jìn)的 ResNet 的“批量歸一化、激活和卷積”結(jié)構(gòu)(參見(jiàn)第 8.6 節(jié)中的練習(xí) )。首先,我們實(shí)現(xiàn)這個(gè)卷積塊結(jié)構(gòu)。
def conv_block(num_channels): return nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.LazyConv2d(num_channels, kernel_size=3, padding=1))
def conv_block(num_channels): blk = nn.Sequential() blk.add(nn.BatchNorm(), nn.Activation('relu'), nn.Conv2D(num_channels, kernel_size=3, padding=1)) return blk
class ConvBlock(nn.Module): num_channels: int training: bool = True @nn.compact def __call__(self, X): Y = nn.relu(nn.BatchNorm(not self.training)(X)) Y = nn.Conv(self.num_channels, kernel_size=(3, 3), padding=(1, 1))(Y) Y = jnp.concatenate((X, Y), axis=-1) return Y
class ConvBlock(tf.keras.layers.Layer): def __init__(self, num_channels): super(ConvBlock, self).__init__() self.bn = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.ReLU() self.conv = tf.keras.layers.Conv2D( filters=num_channels, kernel_size=(3, 3), padding='same') self.listLayers = [self.bn, self.relu, self.conv] def call(self, x): y = x for layer in self.listLayers.layers: y = layer(y) y = tf.keras.layers.concatenate([x,y], axis=-1) return y
密集塊由多個(gè)卷積塊組成,每個(gè)卷積塊使用相同數(shù)量的輸出通道。然而,在前向傳播中,我們?cè)谕ǖ谰S度上連接每個(gè)卷積塊的輸入和輸出。惰性評(píng)估允許我們自動(dòng)調(diào)整維度。
class DenseBlock(nn.Module): def __init__(self, num_convs, num_channels): super(DenseBlock, self).__init__() layer = [] for i in range(num_convs): layer.append(conv_block(num_channels)) self.net = nn.Sequential(*layer) def forward(self, X): for blk in self.net: Y = blk(X) # Concatenate input and output of each block along the channels X = torch.cat((X, Y), dim=1) return X
class DenseBlock(nn.Block): def __init__(self, num_convs, num_channels): super().__init__() self.net = nn.Sequential() for _ in range(num_convs): self.net.add(conv_block(num_channels)) def forward(self, X): for blk in self.net: Y = blk(X) # Concatenate input and output of each block along the channels X = np.concatenate((X, Y), axis=1) return X
class DenseBlock(nn.Module): num_convs: int num_channels: int training: bool = True def setup(self): layer = [] for i in range(self.num_convs): layer.append(ConvBlock(self.num_channels, self.training)) self.net = nn.Sequential(layer) def __call__(self, X): return self.net(X)
class DenseBlock(tf.keras.layers.Layer): def __init__(self, num_convs, num_channels): super(DenseBlock, self).__init__() self.listLayers = [] for _ in range(num_convs): self.listLayers.append(ConvBlock(num_channels)) def call(self, x): for layer in self.listLayers.layers: x = layer(x) return x
在下面的示例中,我們定義了一個(gè)DenseBlock具有 10 個(gè)輸出通道的 2 個(gè)卷積塊的實(shí)例。當(dāng)使用 3 個(gè)通道的輸入時(shí),我們將得到一個(gè)輸出3+10+10=23渠道。卷積塊通道數(shù)控制輸出通道數(shù)相對(duì)于輸入通道數(shù)的增長(zhǎng)。這也稱(chēng)為增長(zhǎng)率。
blk = DenseBlock(2, 10) X = torch.randn(4, 3, 8, 8) Y = blk(X) Y.shape
torch.Size([4, 23, 8, 8])
blk = DenseBlock(2, 10) X = np.random.uniform(size=(4, 3, 8, 8)) blk.initialize() Y = blk(X) Y.shape
(4, 23, 8, 8)
blk = DenseBlock(2, 10) X = jnp.zeros((4, 8, 8, 3)) Y = blk.init_with_output(d2l.get_key(), X)[0] Y.shape
(4, 8, 8, 23)
blk = DenseBlock(2, 10) X = tf.random.uniform((4, 8, 8, 3)) Y = blk(X) Y.shape
TensorShape([4, 8, 8, 23])
8.7.3. 過(guò)渡層
由于每個(gè)密集塊都會(huì)增加通道的數(shù)量,因此添加太多通道會(huì)導(dǎo)致模型過(guò)于復(fù)雜。過(guò)渡層用于控制模型的復(fù)雜性。它通過(guò)使用一個(gè)減少通道的數(shù)量1×1卷積。此外,它通過(guò)步幅為 2 的平均池將高度和寬度減半。
def transition_block(num_channels): return nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.LazyConv2d(num_channels, kernel_size=1), nn.AvgPool2d(kernel_size=2, stride=2))
def transition_block(num_channels): blk = nn.Sequential() blk.add(nn.BatchNorm(), nn.Activation('relu'), nn.Conv2D(num_channels, kernel_size=1), nn.AvgPool2D(pool_size=2, strides=2)) return blk
class TransitionBlock(nn.Module): num_channels: int training: bool = True @nn.compact def __call__(self, X): X = nn.BatchNorm(not self.training)(X) X = nn.relu(X) X = nn.Conv(self.num_channels, kernel_size=(1, 1))(X) X = nn.avg_pool(X, window_shape=(2, 2), strides=(2, 2)) return X
class TransitionBlock(tf.keras.layers.Layer): def __init__(self, num_channels, **kwargs): super(TransitionBlock, self).__init__(**kwargs) self.batch_norm = tf.keras.layers.BatchNormalization() self.relu = tf.keras.layers.ReLU() self.conv = tf.keras.layers.Conv2D(num_channels, kernel_size=1) self.avg_pool = tf.keras.layers.AvgPool2D(pool_size=2, strides=2) def call(self, x): x = self.batch_norm(x) x = self.relu(x) x = self.conv(x) return self.avg_pool(x)
將具有 10 個(gè)通道的過(guò)渡層應(yīng)用于前面示例中的密集塊的輸出。這將輸出通道的數(shù)量減少到 10,并將高度和寬度減半。
blk = transition_block(10) blk(Y).shape
torch.Size([4, 10, 4, 4])
blk = transition_block(10) blk.initialize() blk(Y).shape
(4, 10, 4, 4)
blk = TransitionBlock(10) blk.init_with_output(d2l.get_key(), Y)[0].shape
(4, 4, 4, 10)
blk = TransitionBlock(10) blk(Y).shape
TensorShape([4, 4, 4, 10])
8.7.4. DenseNet 模型
接下來(lái),我們將構(gòu)建一個(gè) DenseNet 模型。DenseNet 首先使用與 ResNet 中相同的單卷積層和最大池化層。
class DenseNet(d2l.Classifier): def b1(self): return nn.Sequential( nn.LazyConv2d(64, kernel_size=7, stride=2, padding=3), nn.LazyBatchNorm2d(), nn.ReLU(), nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
class DenseNet(d2l.Classifier): def b1(self): net = nn.Sequential() net.add(nn.Conv2D(64, kernel_size=7, strides=2, padding=3), nn.BatchNorm(), nn.Activation('relu'), nn.MaxPool2D(pool_size=3, strides=2, padding=1)) return net
class DenseNet(d2l.Classifier): num_channels: int = 64 growth_rate: int = 32 arch: tuple = (4, 4, 4, 4) lr: float = 0.1 num_classes: int = 10 training: bool = True def setup(self): self.net = self.create_net() def b1(self): return nn.Sequential([ nn.Conv(64, kernel_size=(7, 7), strides=(2, 2), padding='same'), nn.BatchNorm(not self.training), nn.relu, lambda x: nn.max_pool(x, window_shape=(3, 3), strides=(2, 2), padding='same') ])
class DenseNet(d2l.Classifier): def b1(self): return tf.keras.models.Sequential([ tf.keras.layers.Conv2D( 64, kernel_size=7, strides=2, padding='same'), tf.keras.layers.BatchNormalization(), tf.keras.layers.ReLU(), tf.keras.layers.MaxPool2D( pool_size=3, strides=2, padding='same')])
然后,類(lèi)似于 ResNet 使用的由殘差塊組成的四個(gè)模塊,DenseNet 使用四個(gè)密集塊。與 ResNet 類(lèi)似,我們可以設(shè)置每個(gè)密集塊中使用的卷積層數(shù)。這里,我們?cè)O(shè)置為4,與8.6節(jié)中的ResNet-18模型一致。此外,我們將密集塊中卷積層的通道數(shù)(即增長(zhǎng)率)設(shè)置為 32,因此每個(gè)密集塊將添加 128 個(gè)通道。
在 ResNet 中,每個(gè)模塊之間的高度和寬度通過(guò)步長(zhǎng)為 2 的殘差塊減少。這里,我們使用過(guò)渡層將高度和寬度減半,并將通道數(shù)減半。與 ResNet 類(lèi)似,在最后連接一個(gè)全局池化層和一個(gè)全連接層以產(chǎn)生輸出。
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = nn.Sequential(self.b1()) for i, num_convs in enumerate(arch): self.net.add_module(f'dense_blk{i+1}', DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add_module(f'tran_blk{i+1}', transition_block( num_channels)) self.net.add_module('last', nn.Sequential( nn.LazyBatchNorm2d(), nn.ReLU(), nn.AdaptiveAvgPool2d((1, 1)), nn.Flatten(), nn.LazyLinear(num_classes))) self.net.apply(d2l.init_cnn)
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = nn.Sequential() self.net.add(self.b1()) for i, num_convs in enumerate(arch): self.net.add(DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add(transition_block(num_channels)) self.net.add(nn.BatchNorm(), nn.Activation('relu'), nn.GlobalAvgPool2D(), nn.Dense(num_classes)) self.net.initialize(init.Xavier())
@d2l.add_to_class(DenseNet) def create_net(self): net = self.b1() for i, num_convs in enumerate(self.arch): net.layers.extend([DenseBlock(num_convs, self.growth_rate, training=self.training)]) # The number of output channels in the previous dense block num_channels = self.num_channels + (num_convs * self.growth_rate) # A transition layer that halves the number of channels is added # between the dense blocks if i != len(self.arch) - 1: num_channels //= 2 net.layers.extend([TransitionBlock(num_channels, training=self.training)]) net.layers.extend([ nn.BatchNorm(not self.training), nn.relu, lambda x: nn.avg_pool(x, window_shape=x.shape[1:3], strides=x.shape[1:3], padding='valid'), lambda x: x.reshape((x.shape[0], -1)), nn.Dense(self.num_classes) ]) return net
@d2l.add_to_class(DenseNet) def __init__(self, num_channels=64, growth_rate=32, arch=(4, 4, 4, 4), lr=0.1, num_classes=10): super(DenseNet, self).__init__() self.save_hyperparameters() self.net = tf.keras.models.Sequential(self.b1()) for i, num_convs in enumerate(arch): self.net.add(DenseBlock(num_convs, growth_rate)) # The number of output channels in the previous dense block num_channels += num_convs * growth_rate # A transition layer that halves the number of channels is added # between the dense blocks if i != len(arch) - 1: num_channels //= 2 self.net.add(TransitionBlock(num_channels)) self.net.add(tf.keras.models.Sequential([ tf.keras.layers.BatchNormalization(), tf.keras.layers.ReLU(), tf.keras.layers.GlobalAvgPool2D(), tf.keras.layers.Flatten(), tf.keras.layers.Dense(num_classes)]))
8.7.5. 訓(xùn)練
由于我們?cè)谶@里使用更深的網(wǎng)絡(luò),在本節(jié)中,我們將輸入的高度和寬度從 224 減少到 96 以簡(jiǎn)化計(jì)算。
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
model = DenseNet(lr=0.01) trainer = d2l.Trainer(max_epochs=10, num_gpus=1) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) trainer.fit(model, data)
trainer = d2l.Trainer(max_epochs=10) data = d2l.FashionMNIST(batch_size=128, resize=(96, 96)) with d2l.try_gpu(): model = DenseNet(lr=0.01) trainer.fit(model, data)
8.7.6. 總結(jié)與討論
構(gòu)成 DenseNet 的主要組件是密集塊和過(guò)渡層。對(duì)于后者,我們需要在組成網(wǎng)絡(luò)時(shí)通過(guò)添加再次縮小通道數(shù)量的過(guò)渡層來(lái)控制維數(shù)。在跨層連接方面,不同于ResNet將輸入和輸出相加,DenseNet是在通道維度上拼接輸入和輸出。雖然這些連接操作重用特征來(lái)實(shí)現(xiàn)計(jì)算效率,但不幸的是它們會(huì)導(dǎo)致大量的 GPU 內(nèi)存消耗。因此,應(yīng)用 DenseNet 可能需要更高效的內(nèi)存實(shí)現(xiàn),這可能會(huì)增加訓(xùn)練時(shí)間 (Pleiss等人,2017 年)。
8.7.7. 練習(xí)
為什么我們?cè)谶^(guò)渡層使用平均池而不是最大池?
DenseNet 論文中提到的優(yōu)點(diǎn)之一是其模型參數(shù)比 ResNet 小。為什么會(huì)這樣?
DenseNet 被詬病的一個(gè)問(wèn)題是它的高內(nèi)存消耗。
真的是這樣嗎?嘗試將輸入形狀更改為 224×224憑經(jīng)驗(yàn)查看實(shí)際的 GPU 內(nèi)存消耗。
你能想到減少內(nèi)存消耗的替代方法嗎?您需要如何更改框架?
實(shí)施 DenseNet 論文(Huang等人,2017 年)表 1 中提供的各種 DenseNet 版本。
應(yīng)用 DenseNet 思想設(shè)計(jì)基于 MLP 的模型。將其應(yīng)用于第 5.7 節(jié)中的房?jī)r(jià)預(yù)測(cè)任務(wù)。
-
連接網(wǎng)絡(luò)
+關(guān)注
關(guān)注
0文章
2瀏覽量
847
發(fā)布評(píng)論請(qǐng)先 登錄
【Milk-V Duo 開(kāi)發(fā)板免費(fèi)體驗(yàn)】學(xué)習(xí):基于Duo開(kāi)發(fā)板的Densenet圖像分類(lèi)
使用加權(quán)密集連接卷積網(wǎng)絡(luò)的深度強(qiáng)化學(xué)習(xí)方法說(shuō)明

基于PyTorch的深度學(xué)習(xí)入門(mén)教程之使用PyTorch構(gòu)建一個(gè)神經(jīng)網(wǎng)絡(luò)
基于PyTorch的深度學(xué)習(xí)入門(mén)教程之PyTorch重點(diǎn)綜合實(shí)踐
PyTorch教程8.2之使用塊的網(wǎng)絡(luò)(VGG)

PyTorch教程8.7之密集連接網(wǎng)絡(luò)(DenseNet)

PyTorch教程8.8之設(shè)計(jì)卷積網(wǎng)絡(luò)架構(gòu)

PyTorch教程之循環(huán)神經(jīng)網(wǎng)絡(luò)

PyTorch教程14.11之全卷積網(wǎng)絡(luò)

評(píng)論