尽管安装了 Tensorflow-gpu，但我的脚本似乎没有在 GPU 上执行

首页课程实战体系课手记专栏慕课教程

尽管安装了 Tensorflow-gpu，但我的脚本似乎没有在 GPU 上执行

我有一台安装了 cuda 10.1 和 tensorflow 和 tensorflow gpu 1.14.0 的机器。我正在运行一个在 virtualenv 中训练 CNN 的 python 脚本。我在源代码中表示我要使用GPU，如下：

import osos.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";os.environ["CUDA_VISIBLE_DEVICES"]="0";

但是，当我运行脚本时，训练阶段需要很多时间才能完成。这是我的 nvidia-smi 的输出：

我认为奇怪的是为什么 GPU 利用率如此之低以及为什么我的 python 脚本没有出现在进程列表中。以下是我尝试过的一些命令的输出：

>>> import tensorflow as tf>>> sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

输出是

2019-10-14 09:53:12.674719: I tensorflow/core/platform/cpu_feature_guard.cc:142] 您的 CPU 支持未编译此 TensorFlow 二进制文件以使用的指令：AVX2 FMA 2019-10-14 09:53:12.679047 : I tensorflow/stream_executor/platform/default/dso_loader.cc:42] 成功打开动态库 libcuda.so.1 2019-10-14 09:53:12.784993: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] 成功从 SysFS 读取的 NUMA 节点为负值（-1），但必须至少有一个 NUMA 节点，所以返回 NUMA 节点为零 2019-10-14 09:53:12.785744: I tensorflow/compiler/xla/service/service. cc:168] XLA 服务 0x55f155c59650 在平台 CUDA 上执行计算。设备：2019-10-14 09:53:12.785771：I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor 设备（0）：GeForce RTX 2080 Ti，计算能力 7.5 2019-10-14 09:53:12.806453：I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU 频率：3600000000 Hz 2019-10-14 09:53:12.807345：I tensorflow/compiler/ xla/service/service.cc:168] XLA 服务 0x55f15605dfc0 在平台主机上执行计算。设备：2019-10-14 09:53:12.807408：I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor 设备（0）：，2019-10-14 09:53:12.807829：I tensorflow/stream_executor /cuda/cuda_gpu_executor.cc:1005] 从 SysFS 读取的成功 NUMA 节点具有负值 (-1)，但必须至少有一个 NUMA 节点，因此返回 NUMA 节点零 2019-10-14 09:53:12.808859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] 找到具有属性的设备 0：名称：GeForce RTX 2080 Ti 主要：7 次要：5 memoryClockRate(GHz)：1.545 pciBusID：0000：2019-10-14 01:00.0 09:53:12.809148：我

慕斯王

浏览 169回答 2

2回答

守着一只汪

最近我向朋友发送了使用 conda 安装 cuda 和 tf-gpu 的说明（因为这很快） - 在互联网上搜索了一段时间后，我的协议是这样的：########################### Install Miniconda##########################mkdir -p ~/installcd ~/installwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh# I guess on a mac you should do# wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.shbash Miniconda3-latest-Linux-x86_64.sh########################## install nvidia driver# so these are the linux (ubuntu) commands# for mac, maybe one should follow the scheme# removing nvidia drivers first# and then download newest nvidia driver# and install it# and reboot## If you are using a laptop without gpu, just skip this block#########################sudo apt purge nvidia-*   # remove all nvidia driver firstsudo add-apt-repository ppa:graphics-drivers/ppasudo apt install nvidia-driver-418sudo apt install nvidia-cuda-toolkit# rebootsudo reboot########################## install machine learning stuff keras tensorflow-gpu## if you are installing in a laptop without gpu,# replace 'tensorflow-gpu' by 'tensorflow'!#########################conda create --name kerasconda activate kerasconda install python ipython jupyter pandas scipy seaborn scikit-learn tensorflow-gpu keras pytest openpyxl graphviz########################## finally, test a successful installation by:# entering:ipython # and there trying:from tensorflow.python.client import device_libprint(device_lib.list_local_devices())# should list gpu# sth like:physical_device_desc: "device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1", name: "/device:XLA_GPU:0"device_type: "XLA_GPU"memory_limit: 17179869184locality {}incarnation: 14085000268159177816physical_device_desc: "device: XLA_GPU device"]

0 0

Smart猫小萌

从您的日志中：无法 dlopen 库“libcudart.so.10.0”；dlerror：libcudart.so.10.0：无法打开共享对象文件：没有这样的文件或目录；你安装了CUDA 10.1但是TF-GPU需要CUDA 10.0，所以你需要安装它（不需要卸载10.1的，它们可以共存）

0 0

随时随地看视频慕课网APP

相关分类

Python