我在pandas(pandas == 0.23.1)中遇到以下奇怪的错误:
import pandas as pd
df = pd.DataFrame({'t1': ["a","b","c"]*10000, 't2': ["x","y","z"]*10000, 'i1': list(range(5000))*6, 'i2': list(range(5000))*6, 'dummy':0})
# works fast with less memory
piv = df.pivot_table(values='dummy', index=['i1','i2'], columns=['t1','t2'])
d2 = df.copy()
d2.t1 = d2.t1.astype('category')
d2.t2 = d2.t2.astype('category')
# needs > 20GB of memory and takes for ever
piv2 = d2.pivot_table(values='dummy', index=['i1','i2'], columns=['t1','t2'])
我想知道这是否是预期的,并且我做错了什么,或者这是否是熊猫中的错误。dtype categoryfor应该str不是非常透明的(对于此用例)?
慕姐8265434
相关分类