在等于值的地方分配列 - pandas df

我试图assign在pandas df. 具体来说,对于df下面的内容,我想用它Column['On']来确定当前发生了多少个值。然后我想将这些值以3. 所以值;


1-3 = 1

4-6 = 2

7-9 = 3 etc

这可以达到 20-30 个值。我考虑过 np.where 但它不是很有效而且我返回了一个错误。


import pandas as pd

import numpy as np


d = ({                

    'On' : [1,2,3,4,5,6,7,7,6,5,4,3,2,1],                                     

      })


df = pd.DataFrame(data=d)

此调用有效:


df['P'] = np.where(df['On'] == 1, df['On'],1)

但是,如果我想将此应用于其他值,则会出现错误:


df = df['P'] = np.where(df['On'] == 1, df['On'],1)

df = df['P'] = np.where(df['On'] == 2, df['On'],1)

df = df['P'] = np.where(df['On'] == 3, df['On'],1)


IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices



慕斯王
浏览 187回答 2
2回答

饮歌长啸

你可以使用系列面具和 locdf['P'] = float('nan')df['P'].loc[(df['On'] >= 1) & (df['On'] <= 3)] = 1df['P'].loc[(df['On'] >= 4) & (df['On'] <= 6)] = 2# ...etc用循环扩展它很容易j = 1for i in range(1, 20):&nbsp; &nbsp; df['P'].loc[(df['On'] >= j) & (df['On'] <= (j+2))] = i&nbsp; &nbsp; j += 3

沧海一幻觉

通过一些基本的数学和矢量化,您可以获得更好的性能。import pandas as pdimport numpy as npn = 1000&nbsp;df = pd.DataFrame({"On":np.random.randint(1,20, n)})AlexG的解决方案%%timej = 1df["P"] =&nbsp; np.nanfor i in range(1, 20):&nbsp; &nbsp; df['P'].loc[(df['On'] >= j) & (df['On'] <= (j+2))] = i&nbsp; &nbsp; j += 3CPU times: user 2.11 s, sys: 0 ns, total: 2.11 sWall time: 2.11 s建议的解决方案%%timedf["P"] = np.ceil(df["On"]/3)CPU times: user 2.48 ms, sys: 0 ns, total: 2.48 msWall time: 2.15 ms加速是 ~1000 倍
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python