在 Pandas 中基于时间戳连续性拆分数据帧

我想创建一个新的 DataFrame,其中包含与最后一列中的值 1.0 或 NaN 相对应的行,由此我只采用 1.0 以下的 Nans。但是,我也想考虑 Result 0.0 的行,只要最多有两个这样的时间戳(例如,在下面的简单示例中,我将采用时间戳为 00-00-30 和 00-00 的行-40)。


Timestamp  Value         Result    

00-00-10   34567          1.0  

00-00-20   45425     

00-00-30   46773          0.0  

00-00-40   64567   

00-00-50   25665          1.0  

00-01-00   25678  

00-01-10   84358 

00-01-20   76869          0.0

00-01-30   95830          

00-01-40   87890        

00-01-50   99537            

00-02-00   85957          1.0

00-02-10   58840    

我把它分成两个数据帧:


df_1 = data[((data['Result'].isnull()) & data['Result'].ffill() == 1) | data.Result == 1]


df_2 = data[((data['Result'].isnull()) & data['Result'].ffill() == 0) | data.Result == 0]

如何拆分df_2成块,使时间戳连续/不中断?(然后我可以检查每个块是否大于允许的长度,如果不是,则将其附加到df_1并根据时间对其进行排序。)


因此,我想要输出:


Timestamp  Value         Result    

00-00-10   34567          1.0  

00-00-20   45425     

00-00-30   46773          0.0  

00-00-40   64567   

00-00-50   25665          1.0  

00-01-00   25678  

00-01-10   84358 

00-02-00   85957          1.0

00-02-10   58840    


婷婷同学_
浏览 220回答 2
2回答

慕盖茨4494581

只做一个mask满足所有三个条件的,然后对原始的进行子集化DataFramemask = (&nbsp; &nbsp; &nbsp; &nbsp; (df.Result == 1)&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; | (df.Result.ffill() == 1)&nbsp; &nbsp; &nbsp; &nbsp; | ((df.Result.ffill() == 0)&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;& (df.groupby((df.Result.ffill() != df.Result.ffill().shift()).cumsum()).Result.transform('size') <= 2))&nbsp; &nbsp; &nbsp; &nbsp;)输出: df[mask]&nbsp; &nbsp;Timestamp&nbsp; Value&nbsp; Result0&nbsp; &nbsp;00-00-10&nbsp; 34567&nbsp; &nbsp; &nbsp;1.01&nbsp; &nbsp;00-00-20&nbsp; 45425&nbsp; &nbsp; &nbsp;NaN2&nbsp; &nbsp;00-00-30&nbsp; 46773&nbsp; &nbsp; &nbsp;0.03&nbsp; &nbsp;00-00-40&nbsp; 64567&nbsp; &nbsp; &nbsp;NaN4&nbsp; &nbsp;00-00-50&nbsp; 25665&nbsp; &nbsp; &nbsp;1.05&nbsp; &nbsp;00-01-00&nbsp; 25678&nbsp; &nbsp; &nbsp;NaN6&nbsp; &nbsp;00-01-10&nbsp; 84358&nbsp; &nbsp; &nbsp;NaN11&nbsp; 00-02-00&nbsp; 85957&nbsp; &nbsp; &nbsp;1.012&nbsp; 00-02-10&nbsp; 58840&nbsp; &nbsp; &nbsp;NaN说明:你有三个条件如果结果 == 1 则保留如果它是低于 Result == 1 的 Nan 则保留(已完成.ffill())第三个条件决定了连续组的大小,如果是连续组的0大小,我们保留<= 2

蝴蝶刀刀

示例数据:df = pd.DataFrame({'Timestamp': ['00-00-10', '00-00-20', '00-00-30', '00-00-40',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'00-00-50', '00-01-00', '00-01-10', '00-01-20',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'00-01-30', '00-01-40', '00-01-50', '00-02-00',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'00-02-10'],&nbsp; &nbsp; &nbsp; 'Value': range(0, 13),&nbsp; &nbsp; &nbsp; 'Result': [1.0, None, 0.0, None, 1.0, None, None, 0.0, None, None, None, 1.0, None]})代码:#where NaN after 1df1 = df.shift(-1)[df.Result == 1]&nbsp;#where 1df2 = df[df.Result==1]#index where 0 with max Timestampind = df[df.Timestamp == min(df[df.Result == 0].Timestamp)[df.Result==0].index[0]#select by ind and one nextdf3 = df.loc[[ind, ind+1]]输出:&nbsp; &nbsp; Result Timestamp&nbsp; Value0&nbsp; &nbsp; &nbsp; NaN&nbsp; 00-00-20&nbsp; &nbsp; 1.04&nbsp; &nbsp; &nbsp; NaN&nbsp; 00-01-00&nbsp; &nbsp; 5.011&nbsp; &nbsp; &nbsp;NaN&nbsp; 00-02-10&nbsp; &nbsp;12.00&nbsp; &nbsp; &nbsp; 1.0&nbsp; 00-00-10&nbsp; &nbsp; 0.04&nbsp; &nbsp; &nbsp; 1.0&nbsp; 00-00-50&nbsp; &nbsp; 4.011&nbsp; &nbsp; &nbsp;1.0&nbsp; 00-02-00&nbsp; &nbsp;11.02&nbsp; &nbsp; &nbsp; 0.0&nbsp; 00-00-30&nbsp; &nbsp; 2.03&nbsp; &nbsp; &nbsp; NaN&nbsp; 00-00-40&nbsp; &nbsp; 3.0然后,您可以根据需要按索引排序。我希望它有帮助。但我不确定我是否了解您对上次选择的了解。我不明白为什么你的结果是“00-01-10”。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python