【2023年】第49天 Self-Attention Generative Adversarial Networks@慕课网原创_慕课网

**作者：Han Zhang Ian Goodfellow Dimitris Metaxas Augustus Odena **

一、Abstract

In this paper, we propose the Self-Attention Generative Adversarial Network (SAGAN) which allows attention-driven, long-range dependency modeling for image generation tasks.
在本文中，我们提出了自我注意力生成对抗网络（SAGAN），它可以为图像生成任务提供注意力驱动的从远程依赖性建模。
词汇：1.propose（verb）：提出；2.attention-driven：注意力驱动；3. long-range dependency 远程依赖，长期依赖，远距离依赖
Traditional convolutional GANs generate high-resolution details as a function of only spatially local points in lower-resolution feature maps.
传统的卷积 GAN 只根据低分辨率特征图中局部空间点的函数生成高分辨率细节。
词汇：1.high-resolution：高分辨率的；2.spatially（adverb）：空间上；3. lower-resolution：低分辨率的
In SAGAN, details can be generated using cues from all feature locations.
在 SAGAN 中，可以利用来自所有特征位置的线索生成细节。
词汇：1.cues（noun）：线索，提示。
Moreover, the discriminator can check that highly detailed features in distant portions of the image are consistent with each other.
此外，鉴别器还可以检查图像远处的高细节特征是否相互一致。
词汇：1.Moreover（adverb）：而且，况且，此外；2.portions（noun）：部分；3.consistent （adjective）：前后一致，一致，连贯一致。
Furthermore, recent work has shown that generator conditioning affects GAN performance.
此外，最近的研究表明，生成器的调节会影响 GAN 的性能。
Leveraging this insight, we apply spectral normalization to the GAN generator and find that this improves training dynamics.
利用这一洞察力，我们对 GAN 生成器进行了频谱归一化处理，并发现这改善了训练的动态性。
The proposed SAGAN performs better than prior work1, boosting the best published Inception score from 36.8 to 52.52 and reducing Fr ́echet Inception distance from 27.62 to 18.65 on the challenging ImageNet dataset.
在具有挑战性的 ImageNet 数据集上，拟议的 SAGAN 比之前的工作1 表现更好，将已发布的最佳入门分数从 36.8 提高到 52.52，并将 Fr ́echet 入门距离从 27.62 缩小到 18.65。
Visualization of the attention layers shows that the generator leverages neighborhoods that correspond to object shapes rather than local regions of fixed shape.
注意力层的可视化显示，生成器利用的是与物体形状相对应的邻域，而不是形状固定的局部区域。

二、Techniques to Stabilize the Training of GANs.

词汇：1.Stabilize（verb）：稳定。

We also investigate two techniques to stabilize the training of GANs on challenging datasets.
我们还研究了在具有挑战性的数据集上稳定 GANs 训练的两种技术。
词汇：1.investigate（verb ）：考察，研究；2.challenging（noun）：具有挑战性的。
First, we use spectral normalization (Miyato et al., 2018) in the generator as well as in the discriminator.
首先，我们在生成器和鉴别器中使用光谱归一化（Miyato 等人，2018 年）。
词汇：1.spectral（adjective）：光谱；2.normalization（noun）：归一化。
Second, we confirm that the two-timescale update rule (TTUR) (Heusel et al., 2017) is effective, and we advocate using it specifically to address slowlearning in regularized discriminators.
其次，我们证实了双时间尺度更新规则（TTUR）（Heusel 等人，2017 年）的有效性，并主张专门使用它来解决正则化判别器中的慢速学习问题。
词汇：1. two-timescale：双时间尺度；2. effective（adjective）：有效的；3.advocate（verb）：提倡。4.specifically（adverb）：专程，专门；5.regularize（verb）：正则化。

4.1 Spectral normalization for both generator and discriminator.

生成器和鉴别器的频谱归一化。

Miyato et al. originally proposed stabilizing the training of GANs by applying spectral normalization to the discriminator network.
Miyato 等人最初提出通过对判别网络进行频谱归一化来稳定 GAN 的训练。
Doing so constrains the Lipschitz constant of the discriminator by restricting the spectral norm of each layer.
这样，通过限制每一层的谱规范，就能约束判别器的 Lipschitz 常量。
Compared to other normalization techniques, spectral normalization does not require extra hyper-parameter tuning (setting the spectral norm of all weight layers to 1 consistently performs well in practice).
与其他归一化技术相比，频谱归一化不需要额外的超参数调整（将所有权重层的频谱规范设为 1 在实践中一直表现良好）。
Moreover, the computational cost is also relatively small.
此外，计算成本也相对较低。
We argue that the generator can also benefit from spectral normalization, based on recent evidence that the conditioning of the generator is an important causal factor in GANs’ performance.
最近有证据表明，生成器的调节是影响 GAN 性能的一个重要因素，因此我们认为生成器也可以从频谱归一化中获益。
Spectral normalization in the generator can prevent the escalation of parameter magnitudes and avoid unusual gradients.
在生成器中进行频谱归一化处理，可以防止参数幅度升级，避免异常梯度。
We find empirically that spectral normalization of both generator and discriminator makes it possible to use fewer discriminator updates pergenerator update, thus significantly reducing the computational cost of training.
我们根据经验发现，生成器和判别器的频谱归一化可以减少每次生成器更新时判别器的更新次数，从而大大降低训练的计算成本。
The approach also shows more stable training behavior.
这种方法还显示出更稳定的训练行为。

Figure 2. The proposed self-attention module for the SAGAN. The ⊗ denotes matrix multiplication. The softmax operation is performed on each row.
图 2. 为 SAGAN 提议的自注意模块。⊗表示矩阵乘法。对每一行进行软最大运算。

4.2 Imbalanced learning rate for generator and discriminator updates

生成器和鉴别器更新的学习率不均衡

In previous work, regularization of the discriminator often slows down the GANs’ learning process.
在以往的工作中，判别器的正则化往往会减慢 GANs 的学习进程。
In practice, methods using regularized discriminators typically require multiple discriminator update steps per generator update step during training.
在实践中，使用正则化判别器的方法通常需要在训练过程中对每个生成器更新步骤进行多次判别器更新。
Independently, Heusel et al. have advocated using separate learning rates (TTUR) for the generator and the discriminator.
另外，Heusel 等人主张对发生器和鉴别器分别使用不同的学习率 (TTUR)。
We propose using TTUR specifically to compensate for the problem of slow learning in a regularized discriminator, making it possible to use fewer discriminator steps per generator step.
我们建议专门使用 TTUR 来弥补正则化判别器学习速度慢的问题，这样就可以减少每个生成器步骤中判别器的步数。
Using this approach, we are able to produce better results given the same wall-clock time.
使用这种方法，我们可以在相同的壁钟时间内获得更好的结果。