我对 Google Cloud Platform 还很陌生,我正在尝试使用 TPU 训练模型。我按照本教程使用 Google Colab 设置 TPU。下面的所有代码都遵循教程。
这是我完成的步骤:
import datetime
import json
import os
import pprint
import random
import string
import sys
import tensorflow as tf
assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; please see the first cell in this notebook for instructions!'
TPU_ADDRESS = 'grpc://' + os.environ['COLAB_TPU_ADDR']
print('TPU address is => ', TPU_ADDRESS)
from google.colab import auth
auth.authenticate_user()
with tf.Session(TPU_ADDRESS) as session:
print('TPU devices:')
pprint.pprint(session.list_devices())
# Upload credentials to TPU.
with open('/content/adc.json', 'r') as f:
auth_info = json.load(f)
tf.contrib.cloud.configure_gcs(session, credentials=auth_info)
# Now credentials are set for all future sessions on this TPU.
输出:
TPU address is => grpc://10.4.89.154:8470
提供我的BUCKET名字和OUPUT DIRECTORY姓名:
BUCKET = 'my_xlnet' #@param {type:"string"}
assert BUCKET, '*** Must specify an existing GCS bucket name ***'
output_dir_name = 'xlnet_output' #@param {type:"string"}
BUCKET_NAME = 'gs://{}'.format(BUCKET)
OUTPUT_DIR = 'gs://{}/{}'.format(BUCKET,output_dir_name)
tf.gfile.MakeDirs(OUTPUT_DIR)
print('***** Model output directory: {} *****'.format(OUTPUT_DIR))
将预训练模型移至 GCS 存储桶:
!gsutil mv /content/xlnet_extension_tf/model/xlnet_cased_L-24_H-1024_A-16 $BUCKET_NAME
输出:
...
Operation completed over 5 objects/1.3 GiB.
然后运行主要代码:
!python /content/xlnet_extension_tf/run_coqa.py \
--use_tpu=True \
--tpu_name=grpc://10.4.89.154:8470 \
--spiece_model_file=$BUCKET_NAME/xlnet_cased_L-24_H-1024_A-16/spiece.model \
--model_config_path=$BUCKET_NAME/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json \
--init_checkpoint=$BUCKET_NAME/xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt \
...
然后我得到了这个错误:
OSError: Not found: "gs://my_xlnet/xlnet_cased_L-24_H-1024_A-16/spiece.model": No such file or directory Error #2
这是 GCS 存储桶屏幕:
我不知道为什么会出现这个错误,因为我可以成功地将我的预训练模型移动到桶中。
你们知道如何解决这个问题吗?
ITMISS
阿晨1998
相关分类