使用 nn.Identity 进行残差学习背后的想法是什么？

使用 nn.Identity 进行残差学习背后的想法是什么？

所以，我已经阅读了大约一半的原始 ResNet 论文，并且正在尝试找出如何为表格数据制作我的版本。

我读过一些关于 PyTorch 如何工作的博客文章，并且我看到大量使用nn.Identity(). 现在，论文还经常使用恒等映射这个术语。然而，它只是指以元素方式将层堆栈的输入添加到同一堆栈的输出。如果输入和输出维度不同，那么本文讨论了用零填充输入或使用矩阵W_s将输入投影到不同的维度。

这是我在博客文章中找到的残差块的抽象：

class ResidualBlock(nn.Module):

def __init__(self, in_channels, out_channels, activation='relu'):

super().__init__()

self.in_channels, self.out_channels, self.activation = in_channels, out_channels, activation

self.blocks = nn.Identity()

self.shortcut = nn.Identity()

def forward(self, x):

residual = x

if self.should_apply_shortcut: residual = self.shortcut(x)

x = self.blocks(x)

x += residual

return x

@property

def should_apply_shortcut(self):

return self.in_channels != self.out_channels

block1 = ResidualBlock(4, 4)

以及我自己对虚拟张量的应用：

x = tensor([1, 1, 2, 2])

block1 = ResidualBlock(4, 4)

block2 = ResidualBlock(4, 6)

x = block1(x)

print(x)

x = block2(x)

print(x)

>>> tensor([2, 2, 4, 4])

>>> tensor([4, 4, 8, 8])

所以在最后，x = nn.Identity(x)除了模仿原始论文中的数学术语之外，我不确定它的用途是什么。但我确信情况并非如此，而且它有一些我还没有看到的隐藏用途。会是什么呢？

编辑这是实施残差学习的另一个例子，这次是在 Keras 中。它按照我上面的建议进行操作，只保留输入的副本以添加到输出中：

def residual_block(x: Tensor, downsample: bool, filters: int, kernel_size: int = 3) -> Tensor:

y = Conv2D(kernel_size=kernel_size,

strides= (1 if not downsample else 2),

filters=filters,

padding="same")(x)

y = relu_bn(y)

y = Conv2D(kernel_size=kernel_size,

strides=1,

filters=filters,

padding="same")(y)

if downsample:

x = Conv2D(kernel_size=1,

strides=2,

filters=filters,

padding="same")(x)

out = Add()([x, y])

out = relu_bn(out)

return out

弑天下

浏览 137回答 1

1回答

Smart猫小萌

ResNet 实现我能想到的最简单的投影通用版本是这样的：class Residual(torch.nn.Module): def __init__(self, module: torch.nn.Module, projection: torch.nn.Module = None): super().__init__() self.module = module self.projection = projection def forward(self, inputs): output = self.module(inputs) if self.projection is not None: inputs = self.projection(inputs) return output + inputs您可以传递module两个堆叠卷积之类的东西，并添加1x1卷积（带有填充或步幅或其他东西）作为投影模块。对于tabular数据，您可以将其用作module（假设您的输入具有50功能）：torch.nn.Sequential( torch.nn.Linear(50, 50), torch.nn.ReLU(), torch.nn.Linear(50, 50), torch.nn.ReLU(), torch.nn.Linear(50, 50),)基本上，您所要做的就是将input某个模块添加到其输出中，仅此而已。理由如下nn.Identity构建神经网络（然后读取它们）可能会更容易，例如批量归一化（取自上述 PR）：batch_norm = nn.BatchNorm2dif dont_use_batch_norm: batch_norm = Identity现在您可以nn.Sequential轻松地使用它：nn.Sequential( ... batch_norm(N, momentum=0.05), ...)当打印网络时，它总是具有相同数量的子模块（带有BatchNorm或Identity），这也使整个过程在我看来更加流畅。这里提到的另一个用例可能是删除现有神经网络的部分内容：net = tv.models.alexnet(pretrained=True)# Assume net has two parts# features and classifiernet.classifier = Identity()net.features(input)现在，您可以运行而不是运行net(input)，这对其他人来说也更容易阅读。

0

0

随时随地看视频慕课网APP

相关分类

Python