熊猫循环优化

是否有更好的方法(在性能方面)在熊猫中执行以下循环(假设df为DataFrame)?


for i in range(len(df)):

    if df['signal'].iloc[i] == 0:   # if the signal is negative

        if df['position'].iloc[i - 1] - 0.02 < -1:   # if the row above - 0.1 < -1 set the value of current row to -1

            df['position'].iloc[i] = -1

        else:   # if the new col value above -0.1 is > -1 then subtract 0.1 from that value

            df['position'].iloc[i] = df['position'].iloc[i - 1] - 0.02

    elif df['signal'].iloc[i] == 1:     # if the signal is positive

        if df['position'].iloc[i - 1] + 0.02 > 1:     # if the value above + 0.1 > 1 set the current row to 1

            df['position'].iloc[i] = 1

        else:   # if the row above + 0.1 < 1 then add 0.1 to the value of the current row

            df['position'].iloc[i] = df['position'].iloc[i - 1] + 0.02

我将不胜感激,因为我刚开始走熊猫路,很显然,可能会错过一些关键的事情。


源CSV数据:


Date,sp500,sp500 MA,UNRATE,UNRATE MA,signal,position

2000-01-01,,,4.0,4.191666666666665,1,0

2000-01-02,,,4.0,4.191666666666665,1,0

2000-01-03,102.93,95.02135,4.0,4.191666666666665,1,0

2000-01-04,98.91,95.0599,4.0,4.191666666666665,1,0

2000-01-05,99.08,95.11245000000001,4.0,4.191666666666665,1,0

2000-01-06,97.49,95.15450000000001,4.0,4.191666666666665,1,0

2000-01-07,103.15,95.21575000000001,4.0,4.191666666666665,1,0

2000-01-08,103.15,95.21575000000001,4.0,4.191666666666665,1,0

2000-01-09,103.15,95.21575000000001,4.0,4.191666666666665,1,0


更新下面的所有答案(在撰写本文时)产生的position0.02常数与我的幼稚循环方法不同。换句话说,我要寻找这样会给一个解决方案0.02,0.04,0.06,0.08等为position列。


繁星coding
浏览 154回答 3
3回答

回首忆惘然

感谢您添加数据和示例输出。首先,我很确定您不能对它进行矢量化处理,因为每个计算都取决于上一个的输出。所以这是我所能做到的最好的。您的方法大约0.116999在我的机器上几秒钟这个大约在0.0039999几秒钟之内没有向量化,但是速度得到了很好的提高,因为为此使用列表并将其添加回末尾的数据帧更快。def myfunc(pos_pre, signal):&nbsp; &nbsp; if signal == 0:&nbsp; # if the signal is negative&nbsp; &nbsp; &nbsp; &nbsp; # if the new col value above -0.2 is > -1 then subtract 0.2 from that value&nbsp; &nbsp; &nbsp; &nbsp; pos = pos_pre - 0.02&nbsp; &nbsp; &nbsp; &nbsp; if pos < -1:&nbsp; # if the row above - 0.2 < -1 set the value of current row to -1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pos = -1&nbsp; &nbsp; elif signal == 1:&nbsp; &nbsp; &nbsp; &nbsp; # if the row above + 0.2 < 1 then add 0.2 to the value of the current row&nbsp; &nbsp; &nbsp; &nbsp; pos = pos_pre + 0.02&nbsp; &nbsp; &nbsp; &nbsp; if pos > 1:&nbsp; # if the value above + 0.1 > 1 set the current row to 1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; pos = 1&nbsp; &nbsp; return pos''' set first position value because you aren't technically calculating it correctly in your method since there is no&nbsp;position minus 1... IE: it will always be 0.02'''new_pos = [0.02]# skip index zero since there is no position 0 minus 1for i in range(1, len(df)):&nbsp; &nbsp; new_pos.append(myfunc(pos_pre=new_pos[i-1], signal=df['signal'].iloc[i]))df['position'] = new_pos输出:df.position0&nbsp; &nbsp; 0.021&nbsp; &nbsp; 0.042&nbsp; &nbsp; 0.063&nbsp; &nbsp; 0.084&nbsp; &nbsp; 0.105&nbsp; &nbsp; 0.126&nbsp; &nbsp; 0.147&nbsp; &nbsp; 0.168&nbsp; &nbsp; 0.18

慕森卡

不要使用循环。熊猫专门从事矢量化运算,例如signal == 0:pos_shift = df['position'].shift() - 0.02m1 = df['signal'] == 0m2 = pos_shift < -1df.loc[m1 & m2, 'position'] = -1df['position'] = np.where(m1 & ~m2, pos_shift, df['position'])您可以为编写类似的内容signal == 1。

汪汪一只猫

有很多更好的方法,但是这种方法也应该起作用:df['previous'] = df.signal.shift()def get_signal_value(row):&nbsp; &nbsp; if row.signal == 0:&nbsp; &nbsp; &nbsp; &nbsp; compare = row.previous - 0.02&nbsp; &nbsp; &nbsp; &nbsp; if compare < -1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return -1&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return compare&nbsp; &nbsp; elif row.signal == 1:&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; compare = row.previous + 0.01&nbsp; &nbsp; &nbsp; &nbsp; if compare > 1:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return 1&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return comparedf['new_signal'] = df.apply(lambda row: get_signal_value(row), axis=1)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python