与 Databricks 笔记本中的 Blob 存储文件交互的过程

在 Azure Databricks 笔记本中,我尝试使用以下命令对 Blob 存储中的某些 csv 执行转换:


*import os

    import glob

    import pandas as pd

    os.chdir(r'wasbs://dalefactorystorage.blob.core.windows.net/dale')

    allFiles = glob.glob("*.csv") # match your csvs

    for file in allFiles:

       df = pd.read_csv(file)

       df = df.iloc[4:,] # read from row 4 onwards.

       df.to_csv(file)

       print(f"{file} has removed rows 0-3")*

不幸的是我收到以下错误:


*FileNotFoundError: [Errno 2] 没有这样的文件或目录: 'wasbs://dalefactorystorage.blob.core.windows.net/dale'


我错过了什么吗?(我对此完全陌生)。


潇湘沐
浏览 158回答 2
2回答

DIEA

如果您想使用包pandas从 Azure blob 读取 CSV 文件,对其进行处理并将此 CSV 文件写入 Azure Databricks 中的 Azure blob,我建议您将 Azure blob 存储挂载为 Databricks 文件系统,然后执行此操作。欲了解更多详情,请参阅此处。例如装载 Azure 斑点dbutils.fs.mount(&nbsp; source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",&nbsp; mount_point = "/mnt/<mount-name>",&nbsp; extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":"<account access key>"})处理 csvimport osimport globimport pandas as pdos.chdir(r'/dbfs/mnt/<mount-name>/<>')allFiles = glob.glob("*.csv") # match your csvsfor file in allFiles:&nbsp; &nbsp; print(f" The old content of&nbsp; file {file} : ")&nbsp; &nbsp; df= pd.read_csv(file, header=None)&nbsp; &nbsp; print(df)&nbsp; &nbsp; df = df.iloc[4:,]&nbsp; &nbsp; df.to_csv(file, index=False,header=False)&nbsp; &nbsp; print(f" The new content of&nbsp; file {file} : ")&nbsp; &nbsp; df= pd.read_csv(file,header=None)&nbsp; &nbsp; print(df)&nbsp; &nbsp; break

慕雪6442864

A,替代方法是将 dbfs 文件挂载为 Spark 数据帧,然后将其从 Sparkdf 转换为 pandas df:# mount blob storagespark.conf.set("fs.azure.account.key.storageaccountname.blob.core.windows.net","storageaccesskey")dfspark = spark.read.csv("wasbs://containername@storageaccountname.blob.core.windows.net/filename.csv", header="true")# convert from sparkdf to pandasdf&nbsp;df = dfspark.toPandas()
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python