Batch-Normalization有三种定义格式,第一种格式是低级版本,需要先计算均值和方差。后面的两种是封装后的,可以直接使用,下面分别介绍:
1、tf.nn.batch_normalization
这个函数实现batch_normalization需要两步,分装程度较低,一般不使用
(1)tf.nn.moments(x, axes, name=None, keep_dims=False) mean, variance:
统计矩,mean 是一阶矩,variance 则是二阶中心矩
(2)tf.nn.batch_normalization(x, mean, variance, offset, scale, variance_epsilon, name=None)
由函数接口可知,tf.nn.moments 计算返回的 mean 和 variance 作为 tf.nn.batch_normalization 参 数进一步调用;
如我们需计算的 tensor 的 shape 为一个四元组 [batch_size, height, width, kernels],一个示例程序如下:
import tensorflow as tf shape = [128, 32, 32, 64] a = tf.Variable(tf.random_normal(shape)) # a:activations axis = list(range(len(shape)-1)) # len(x.get_shape()) a_mean, a_var = tf.nn.moments(a, axis)
得到a_mean和a_var以后可以进入第二步计算:
tf.nn.batch_normalization( x, #输入 mean = a_mean, variance = a_var, offset, #tensor,偏移量 scale, # tensor,尺度缩放值 variance_epsilon, #避免除0 name=None)
完整的实现:
import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' import tensorflow as tf import numpy as np from tensorflow.python.training.moving_averages import assign_moving_average # def batch_norm(x, train, eps=1e-05, decay=0.9, affine=True, name=None): with tf.variable_scope(name, default_name='BatchNorm2d'): params_shape = x.get_shape().as_list() params_shape = params_shape[-1:] moving_mean = tf.get_variable('mean', shape=params_shape, initializer=tf.zeros_initializer, trainable=False) moving_variance = tf.get_variable('variance', shape=params_shape, initializer=tf.ones_initializer, trainable=False) def mean_var_with_update(): axis = list(range(len(shape) - 1)) mean, variance = tf.nn.moments(x, axis, name='moments') with tf.control_dependencies([assign_moving_average(moving_mean, mean, decay),#计算滑动平均值 assign_moving_average(moving_variance, variance, decay)]): return tf.identity(mean), tf.identity(variance) if train:#亲测tf.cond的第一个函数不能直接写成ture or false,所以只好用一个很蠢的方法。 xx = tf.constant(3) yy = tf.constant(4) else: xx = tf.constant(4) yy = tf.constant(3) mean, variance = tf.cond(xx<yy, mean_var_with_update, lambda: (moving_mean, moving_variance)) if affine: beta = tf.get_variable('beta', params_shape, initializer=tf.zeros_initializer) gamma = tf.get_variable('gamma', params_shape, initializer=tf.ones_initializer) x = tf.nn.batch_normalization(x, mean, variance, beta, gamma, eps) else: x = tf.nn.batch_normalization(x, mean, variance, None, None, eps) return x shape = [128, 32, 32, 64] a = tf.Variable(tf.random_normal(shape)) d = batch_norm(a,True) sess = tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(a)) print(sess.run(d))
2、tf.contrib.layers.batch_norm
tf.contrib.layers.batch_norm( inputs,#输入 decay=0.999,#衰减系数。合适的衰减系数值接近1.0,特别是含多个9的值:0.999,0.99,0.9。如果训练集表现很好而验证/测试集表现得不好,选择 #小的系数(推荐使用0.9)。 center=True,#如果为True,有beta偏移量;如果为False,无beta偏移量 epsilon=0.001,#避免被零除 scale=False,#如果为True,则乘以gamma。如果为False,gamma则不使用。当下一层是线性的时(例如nn.relu),由于缩放可以由下一层完成, #所以可以禁用该层。 param_initializers=None,# beta, gamma, moving mean and moving variance的优化初始化 activation_fn=None,#用于激活,默认为线性激活函数 updates_collections=tf.GraphKeys.UPDATE_OPS, param_regularizers=None,# beta and gamma正则化优化 is_training=True,#is_training:图层是否处于训练模式。 outputs_collections=None, reuse=None, variables_collections=None, data_format=DATA_FORMAT_NHWC, trainable=True, batch_weights=None, fused=None, zero_debias_moving_mean=False,#如果想要提高稳定性,zero_debias_moving_mean设为True scope=None, renorm=False, renorm_clipping=None, renorm_decay=0.99, adjustment=None)
3、tf.layers.batch_normalization(参数含义和第二种差不过,不在单独介绍)
tf.layers.batch_normalization( inputs, axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True, beta_initializer=tf.zeros_initializer(), gamma_initializer=tf.ones_initializer(), moving_mean_initializer=tf.zeros_initializer(), moving_variance_initializer=tf.ones_initializer(), beta_regularizer=None, gamma_regularizer=None, beta_constraint=None, gamma_constraint=None, training=False, trainable=True, name=None, reuse=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99, fused=None, virtual_batch_size=None, adjustment=None)