猿问

使用tensorflow梯度磁带的内存不足但仅在我附加列表时发生

我一直在使用 CNN 处理数据集 (1000,3253)。我正在通过梯度磁带运行梯度计算,但它一直在耗尽内存。然而,如果我删除将梯度计算附加到列表的行,脚本将运行所有时期。我不完全确定为什么会发生这种情况,但我对 tensorflow 和渐变带的使用也是陌生的。任何建议或意见将不胜感激


        #create a batch loop

    for x, y_true in train_dataset:            

        #create a tape to record actions



        with  tf.GradientTape(watch_accessed_variables=False) as tape:

            x_var = tf.Variable(x)

            tape.watch([model.trainable_variables,x_var])    


            y_pred = model(x_var,training=True)    

            tape.stop_recording()

            loss = los_func(y_true, y_pred)

        epoch_loss_avg.update_state(loss)

        epoch_accuracy.update_state(y_true, y_pred)                


        #pdb.set_trace() 

        gradients,something = tape.gradient(loss, (model.trainable_variables,x_var))

        #sa_input.append(tape.gradient(loss, x_var))

        del tape            



        #apply gradients

        sa_input.append(something)

        opti_func.apply_gradients(zip(gradients, model.trainable_variables)) 

    train_loss_results.append(epoch_loss_avg.result())

    train_accuracy_results.append(epoch_accuracy.result())


摇曳的蔷薇
浏览 149回答 1
1回答

呼唤远方

由于您是 TF2 的新手,建议您阅读本指南。本指南涵盖 TensorFlow 2.0 中两种广泛情况下的训练、评估和预测(推理)模型:使用内置 API 进行训练和验证时(例如 model.fit()、model.evaluate()、model.predict())。这在“使用内置训练和评估循环”部分中有所介绍。使用 eager execution 和 GradientTape 对象从头开始编写自定义循环时。这在“从头开始编写您自己的训练和评估循环”一节中有所介绍。下面是一个程序,我在其中计算每个纪元后的梯度并附加到列表中。在程序结束时,为了简单起见,我将转换list为array。代码 -如果我使用多层和更大过滤器尺寸的深度网络,这个程序会抛出 OOM Error 错误# Importing dependency%tensorflow_version 2.xfrom tensorflow import kerasfrom tensorflow.keras import backend as Kfrom tensorflow.keras import datasetsfrom tensorflow.keras.models import Sequentialfrom tensorflow.keras.layers import Dense, Activation, Dropout, Flatten, Conv2D, MaxPooling2Dfrom tensorflow.keras.layers import BatchNormalizationimport numpy as npimport tensorflow as tf# Import Data(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()# Build Modelmodel = Sequential()model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(32,32, 3)))model.add(MaxPooling2D((2, 2)))model.add(Conv2D(64, (3, 3), activation='relu'))model.add(MaxPooling2D((2, 2)))model.add(Conv2D(64, (3, 3), activation='relu'))model.add(Flatten())model.add(Dense(64, activation='relu'))model.add(Dense(10))# Model Summarymodel.summary()# Model Compile model.compile(optimizer='adam',              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),              metrics=['accuracy'])# Define the Gradient Fucntionepoch_gradient = []loss_fn = keras.losses.SparseCategoricalCrossentropy(from_logits=True)# Define the Gradient Function@tf.functiondef get_gradient_func(model):    with tf.GradientTape() as tape:       logits = model(train_images, training=True)       loss = loss_fn(train_labels, logits)        grad = tape.gradient(loss, model.trainable_weights)    model.optimizer.apply_gradients(zip(grad, model.trainable_variables))    return grad# Define the Required Callback Functionclass GradientCalcCallback(tf.keras.callbacks.Callback):  def on_epoch_end(self, epoch, logs={}):    grad = get_gradient_func(model)    epoch_gradient.append(grad)epoch = 4print(train_images.shape, train_labels.shape)model.fit(train_images, train_labels, epochs=epoch, validation_data=(test_images, test_labels), callbacks=[GradientCalcCallback()])# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) typegradient = np.asarray(epoch_gradient)print("Total number of epochs run:", epoch)输出 -Model: "sequential_5"_________________________________________________________________Layer (type)                 Output Shape              Param #   =================================================================conv2d_12 (Conv2D)           (None, 30, 30, 32)        896       _________________________________________________________________max_pooling2d_8 (MaxPooling2 (None, 15, 15, 32)        0         _________________________________________________________________conv2d_13 (Conv2D)           (None, 13, 13, 64)        18496     _________________________________________________________________max_pooling2d_9 (MaxPooling2 (None, 6, 6, 64)          0         _________________________________________________________________conv2d_14 (Conv2D)           (None, 4, 4, 64)          36928     _________________________________________________________________flatten_4 (Flatten)          (None, 1024)              0         _________________________________________________________________dense_11 (Dense)             (None, 64)                65600     _________________________________________________________________dense_12 (Dense)             (None, 10)                650       =================================================================Total params: 122,570Trainable params: 122,570Non-trainable params: 0_________________________________________________________________(50000, 32, 32, 3) (50000, 1)Epoch 1/41563/1563 [==============================] - 109s 70ms/step - loss: 1.7026 - accuracy: 0.4081 - val_loss: 1.4490 - val_accuracy: 0.4861Epoch 2/41563/1563 [==============================] - 145s 93ms/step - loss: 1.2657 - accuracy: 0.5506 - val_loss: 1.2076 - val_accuracy: 0.5752Epoch 3/41563/1563 [==============================] - 151s 96ms/step - loss: 1.1103 - accuracy: 0.6097 - val_loss: 1.1122 - val_accuracy: 0.6127Epoch 4/41563/1563 [==============================] - 152s 97ms/step - loss: 1.0075 - accuracy: 0.6475 - val_loss: 1.0508 - val_accuracy: 0.6371Total number of epochs run: 4希望这能回答您的问题。快乐学习。
随时随地看视频慕课网APP

相关分类

Python
我要回答