在 HPC 集群中创建 Dask LocalCluster 实例时 SLURM 任务失败

我正在使用命令sbatch和下一个配置排队任务:


#SBATCH --job-name=dask-test

#SBATCH --ntasks=1

#SBATCH --cpus-per-task=10

#SBATCH --mem=80G

#SBATCH --time=00:30:00

#SBATCH --tmp=10G

#SBATCH --partition=normal

#SBATCH --qos=normal


python ./dask-test.py

python脚本大致如下:


import pandas as pd

import dask.dataframe as dd

import numpy as np


from dask.distributed import Client, LocalCluster


print("Generating LocalCluster...")

cluster = LocalCluster()

print("Generating Client...")

client = Client(cluster, processes=False)


print("Scaling client...")

client.scale(8)


data = dd.read_csv(

    BASE_DATA_SOURCE + '/Data-BIGDATFILES-*.csv',

    delimiter=';',

)


def get_min_dt():

    min_dt = data.datetime.min().compute()

    print("Min is {}".format())


print("Getting min dt...")

get_min_dt()

第一个问题是文本“Generating LocalCluster...”打印了 6 次,这让我怀疑脚本是否同时运行了多次。其次,在几分钟不打印任何内容后,我收到以下消息:


/anaconda3/lib/python3.7/site-packages/distributed/node.py:155: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37396 instead

  http_address["port"], self.http_server.port


繁星点点滴滴
浏览 241回答 1
1回答

慕斯王

经过一些研究,我可以得到一个解决方案。不太确定原因,但非常确定它有效。LocalCluster、Client 及其之后的所有代码(将分发执行的代码)的实例化不得在 Python 脚本的模块级别。相反,此代码必须位于方法中或 __main__ 块内,如下所示:import pandas as pdimport dask.dataframe as ddimport numpy as npfrom dask.distributed import Client, LocalClusterif __name__ == "__main__":    print("Generating LocalCluster...")    cluster = LocalCluster()    print("Generating Client...")    client = Client(cluster, processes=False)    print("Scaling client...")    client.scale(8)    data = dd.read_csv(        BASE_DATA_SOURCE + '/Data-BIGDATFILES-*.csv',        delimiter=';',    )    def get_min_dt():        min_dt = data.datetime.min().compute()        print("Min is {}".format())    print("Getting min dt...")    get_min_dt()这个简单的改变带来了不同。在该问题线程中找到了解决方案:https://github.com/dask/distributed/issues/2520#issuecomment-470817810
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python