猿问

处理大量 parquet 文件时出现 CUDF 错误

我在一个目录中有 2000 个镶木地板文件。每个 parquet 文件大小约为 20MB。使用的压缩是 SNAPPY。每个 parquet 文件都有如下所示的行:


+------------+-----------+-----------------+

| customerId | productId | randomAttribute |

+------------+-----------+-----------------+

| ID1        | PRODUCT1  | ATTRIBUTE1      |

| ID2        | PRODUCT2  | ATTRIBUTE2      |

| ID2        | PRODUCT3  | ATTRIBUTE3      |

+------------+-----------+-----------------+

每个列条目都是一个字符串。我正在使用具有以下配置的 p3.8xlarge EC2 实例:


内存:244GB

vCPU : 32

GPU RAM:64GB(每个GPU核心有16GB RAM)

GPU:4特斯拉V100

我正在尝试以下代码:


def read_all_views(parquet_file_lst):

    df_lst = []    

    for file in parquet_file_lst:

        df = cudf.read_parquet(file, columns=['customerId', 'productId'])

        df_lst.append(df)

    return cudf.concat(df_lst)

这在处理前 180 个文件后崩溃,并出现以下运行时错误:


Traceback (most recent call last):

File "<stdin>", line 1, in <module>

File "<stdin>", line 9, in read_all_views

File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/cudf/io/parquet.py", line 54, in read_parquet

    use_pandas_metadata,

File "cudf/_lib/parquet.pyx", line 25, in 

cudf._lib.parquet.read_parquet

File "cudf/_lib/parquet.pyx", line 80, in cudf._lib.parquet.read_parquet

RuntimeError: rmm_allocator::allocate(): RMM_ALLOC: unspecified launch failure

在任何给定时间,只有 10% 的 GPU 和 CPU RAM 被使用。任何想法如何调试这个或相同的解决方法是什么?


HUWWW
浏览 285回答 1
1回答

GCT1015

cuDF 是一个单一的 GPU 库。2000 个 20 MB 的文件大约是 40 GB 的数据,这比单个 V100 GPU 的内存容量要多。对于需要更多单个 GPU 的工作流程,cuDF 依赖于 Dask。以下示例说明了如何使用 cuDF + Dask 将数据读入分布式 GPU 内存,单个节点中有多个 GPU。这不能回答您的调试问题,但应该有望解决您的问题。首先,我使用几行代码来创建一个包含两个 GPU 的 Dask 集群。from dask.distributed import Clientfrom dask_cuda import LocalCUDAClusterimport dask_cudfcluster = LocalCUDACluster() # by default use all GPUs in the node. I have two.client = Client(cluster)client# The print output of client:#&nbsp;# Client# Scheduler: tcp://127.0.0.1:44764# Dashboard: http://127.0.0.1:8787/status# Cluster# Workers: 2# Cores: 2# Memory: 404.27 GB接下来,我将为此示例创建几个 parquet 文件。import osimport cudffrom cudf.datasets import randomdataif not os.path.exists('example_output'):&nbsp; &nbsp; os.mkdir('example_output')for x in range(2):&nbsp; &nbsp; df = randomdata(nrows=10000,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; dtypes={'a':int, 'b':str, 'c':str, 'd':int},&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; seed=12)&nbsp; &nbsp; df.to_parquet('example_output/df')让我们看看我的每个 GPU 上的内存nvidia-smi。nvidia-smiThu Sep 26 19:13:46 2019&nbsp; &nbsp; &nbsp; &nbsp;+-----------------------------------------------------------------------------+| NVIDIA-SMI 410.104&nbsp; &nbsp; &nbsp; Driver Version: 410.104&nbsp; &nbsp; &nbsp; CUDA Version: 10.0&nbsp; &nbsp; &nbsp;||-------------------------------+----------------------+----------------------+| GPU&nbsp; Name&nbsp; &nbsp; &nbsp; &nbsp; Persistence-M| Bus-Id&nbsp; &nbsp; &nbsp; &nbsp; Disp.A | Volatile Uncorr. ECC || Fan&nbsp; Temp&nbsp; Perf&nbsp; Pwr:Usage/Cap|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Memory-Usage | GPU-Util&nbsp; Compute M. ||===============================+======================+======================||&nbsp; &nbsp;0&nbsp; Tesla T4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; On&nbsp; &nbsp;| 00000000:AF:00.0 Off |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0 || N/A&nbsp; &nbsp;51C&nbsp; &nbsp; P0&nbsp; &nbsp; 29W /&nbsp; 70W |&nbsp; &nbsp;6836MiB / 15079MiB |&nbsp; &nbsp; &nbsp; 0%&nbsp; &nbsp; &nbsp; Default |+-------------------------------+----------------------+----------------------+|&nbsp; &nbsp;1&nbsp; Tesla T4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; On&nbsp; &nbsp;| 00000000:D8:00.0 Off |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0 || N/A&nbsp; &nbsp;47C&nbsp; &nbsp; P0&nbsp; &nbsp; 28W /&nbsp; 70W |&nbsp; &nbsp;5750MiB / 15079MiB |&nbsp; &nbsp; &nbsp; 0%&nbsp; &nbsp; &nbsp; Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;GPU Memory ||&nbsp; GPU&nbsp; &nbsp; &nbsp; &nbsp;PID&nbsp; &nbsp;Type&nbsp; &nbsp;Process name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Usage&nbsp; &nbsp; &nbsp; ||=============================================================================|+-----------------------------------------------------------------------------+注意这两个值。GPU 0 上 6836 MB 和 GPU 1 上 5750 MB(我碰巧在这些 GPU 的内存中已经有不相关的数据)。现在让我们使用 Dask cuDF 读取两个 parquet 文件的整个目录,然后读取persist它。坚持它会强制计算——Dask 执行是惰性的,因此仅调用read_parquet只会将任务添加到任务图中。ddf是一个 Dask DataFrame。ddf = dask_cudf.read_parquet('example_output/df')ddf = ddf.persist()现在让我们nvidia-smi再看一遍。Thu Sep 26 19:13:52 2019&nbsp; &nbsp; &nbsp; &nbsp;+-----------------------------------------------------------------------------+| NVIDIA-SMI 410.104&nbsp; &nbsp; &nbsp; Driver Version: 410.104&nbsp; &nbsp; &nbsp; CUDA Version: 10.0&nbsp; &nbsp; &nbsp;||-------------------------------+----------------------+----------------------+| GPU&nbsp; Name&nbsp; &nbsp; &nbsp; &nbsp; Persistence-M| Bus-Id&nbsp; &nbsp; &nbsp; &nbsp; Disp.A | Volatile Uncorr. ECC || Fan&nbsp; Temp&nbsp; Perf&nbsp; Pwr:Usage/Cap|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Memory-Usage | GPU-Util&nbsp; Compute M. ||===============================+======================+======================||&nbsp; &nbsp;0&nbsp; Tesla T4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; On&nbsp; &nbsp;| 00000000:AF:00.0 Off |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0 || N/A&nbsp; &nbsp;51C&nbsp; &nbsp; P0&nbsp; &nbsp; 29W /&nbsp; 70W |&nbsp; &nbsp;6938MiB / 15079MiB |&nbsp; &nbsp; &nbsp; 2%&nbsp; &nbsp; &nbsp; Default |+-------------------------------+----------------------+----------------------+|&nbsp; &nbsp;1&nbsp; Tesla T4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; On&nbsp; &nbsp;| 00000000:D8:00.0 Off |&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0 || N/A&nbsp; &nbsp;47C&nbsp; &nbsp; P0&nbsp; &nbsp; 28W /&nbsp; 70W |&nbsp; &nbsp;5852MiB / 15079MiB |&nbsp; &nbsp; &nbsp; 2%&nbsp; &nbsp; &nbsp; Default |+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+| Processes:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;GPU Memory ||&nbsp; GPU&nbsp; &nbsp; &nbsp; &nbsp;PID&nbsp; &nbsp;Type&nbsp; &nbsp;Process name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Usage&nbsp; &nbsp; &nbsp; ||=============================================================================|+-----------------------------------------------------------------------------+Dask 为我们处理在两个 GPU 上分发我们的数据。
随时随地看视频慕课网APP

相关分类

Python
我要回答