在 Windows 上读取 snappy parquet 文件导致 python 崩溃

我无法在 Windows 上通过 pyarrow 读取活泼的镶木地板文件。


import dask.dataframe as dd

import pandas as pd

import numpy as np

df = pd.DataFrame(np.random.randint(0,100,size=(15, 4)), columns=list('ABCD'))

dd_df = dd.from_pandas(df, npartitions=1)

dd_df.to_parquet("my_df.snappy.parquet", engine="pyarrow", compression="snappy")

dd_df_copy = dd.read_parquet("my_df.snappy.parquet", engine="pyarrow")

dd_df_copy.compute() #<--- This is where it crashes

我已经使用 Python 3.8 在干净的 Anaconda 环境中复制了这个问题。创建环境后,我跑pip install "dask[complete]"了pip install pyarrow


错误是:


Problem signature:

  Problem Event Name:   APPCRASH

  Application Name: python.exe

  Application Version:  3.8.3150.1013

  Application Timestamp:    5ed53446

  Fault Module Name:    arrow.dll

  Fault Module Version: 0.0.0.0

  Fault Module Timestamp:   5ebd3029

  Exception Code:   c000001d

  Exception Offset: 00000000007abfc7

  OS Version:   6.3.9600.2.0.0.16.7

  Locale ID:    1033

  Additional Information 1: d8e4

  Additional Information 2: d8e42c04b828d96accf490cd13472bea

  Additional Information 3: aebe

  Additional Information 4: aebe917bfb5c1b58e884baa1f9c3d3d2

当我尝试使用时出现类似版本的崩溃conda -c conda-forge dask pyarrow:


Problem signature:

  Problem Event Name:   APPCRASH

  Application Name: python.exe

  Application Version:  3.8.3150.1013

  Application Timestamp:    5ed53446

  Fault Module Name:    arrow.dll

  Fault Module Version: 0.0.0.0

  Fault Module Timestamp:   5ecf56ac

  Exception Code:   c000001d

  Exception Offset: 0000000000521587

  OS Version:   6.3.9600.2.0.0.16.7

  Locale ID:    1033

  Additional Information 1: e863

  Additional Information 2: e8638a01b9fb70505b0604ef9b98f3c6

  Additional Information 3: 1e47

  Additional Information 4: 1e47c852f479606e071f3ea8f80878a1


动漫人物
浏览 174回答 1
1回答

holdtom

从 2020 年 7 月 1 日起,更新软件包解决了这个问题。我认为是pyarrow更新解决了这个问题。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python