我将如何以不同的数量抵消 pandas 数据列?

我正在pandas使用matplotlib. 我有一个绘图策略,我将归零到初始值,然后将每个选定的变量偏移一个设定值。例如,这是我目前的绘图方法:


fig, ax = plt.subplots()

# data is in a dataframe called inputData

timeseries_plots=['var1','var3','var8']

offsetFactor = 20


for ii,var in enumerate(timeseries_plots)

    offsetRef = inputData[var].loc[~inputData[var].isnull()].iloc[0]

    ax.plot(inputData[TimeIndex], offsetFactor*(len(timeseries_plots_avg)-ii-1)+inputData[timeseries_plots_avg[ii]]-offsetRef, label=var,markersize=1,marker='None',linestyle = 'solid',color=colour)

plt.show()

这产生了这样的东西(有一些matplotlib技巧):

http://img.mukewang.com/64632d060001b4b207630460.jpg

如您所见,它删除了offsetRef(在本例中为变量的初始值),然后offsetFactor向每个变量添加一个常量(在本例中等于 20)。结果是开始垂直偏移 20 的行。


但是,当值开始随时间漂移并且一个变量可能与另一个变量交叉时,这可能会成为一个问题。我想做的是重置垂直偏移量 - 例如通过在特定日期之后更改 offsetRef。


我试图通过以下方式做到这一点。我首先初始化一个等于变量大小的数组。offsetRef然后我用在 处重新计算的值填充它resetDates。我已经包含了注释,这些注释标记了#PSEUDOCODE我大致写下我想做的事情的地方——但提前道歉,因为它们非常粗糙。先感谢您!


fig, ax = plt.subplots()

inputData = pd.DataFrame(np.random.randint(100, size=(100, 5)), columns=['timestamp','var2','var3','var4','var5'])

inputData['timestamp'][:]=pd.date_range('2020-may-01','2020-aug-08')

timeseries_plots=['var1','var3','var4']

offsetFactor = 20

resetDates = ['2020-jun-23','2020-jul-05']


for ii,var in enumerate(timeseries_plots)

    offsetRef = np.zeros(inputData[var].size)

    for tt,ttdate in enumerate(resetDates):

        if tt=0:

        #PSEUDO CODE: offsetRef[ inputData['timestamp'] <resetDates[tt]] = inputData[var].loc[~inputData[var].isnull()].iloc[0]

        #PSEUDO CODE: offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].loc[~inputData[var].isnull()].iloc[ttdate]

    #PSEUDO CODE: offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].loc[~inputData[var].isnull()].iloc[ttdate]

    

    ax.plot(inputData[TimeIndex], offsetFactor*(len(timeseries_plots_avg)-ii-1)+inputData[timeseries_plots_avg[ii]]-offsetRef, label=var,markersize=1,marker='None',linestyle = 'solid',color=colour)

plt.show()


猛跑小猪
浏览 108回答 1
1回答

当年话下

这是我将坚持的当前解决方案,以便它可能对其他人有用:fig, ax = plt.subplots()# set up dfinputData = pd.DataFrame(np.random.randint(100, size=(100, 5)), columns=['timestamp','var2','var3','var4','var5'])inputData['timestamp'][:]=pd.date_range('2020-may-01','2020-aug-08')inputData['var2']=np.arange(0,100,1)inputData['var2'][0:3]=49inputData['var4']=np.arange(0,200,2)inputData['var2'][0:3]=np.nan# set constants and settingsdispFactor=20timeseries_plots=['var2','var4']resetDates=['2020-05-05','2020-05-20', '2020-08-04']offsetFactor = dispFactor#beginfig, ax=plt.subplots()for ii,var in enumerate(timeseries_plots):&nbsp; &nbsp; offsetRef = np.zeros(inputData[var].size)&nbsp; &nbsp; for tt,ttdate in enumerate(resetDates):&nbsp; &nbsp; &nbsp; &nbsp; if tt==0:&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if inputData[var].loc[inputData['timestamp']==ttdate].isna().bool(): #if date is nan&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print('a',inputData[var].loc[~inputData[var].isnull()].iloc[0],inputData[var].bfill().loc[inputData['timestamp']==ttdate])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; offsetRef[(inputData['timestamp']<ttdate)]= inputData[var].loc[~inputData[var].isnull()].iloc[0]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; offsetRef[(inputData['timestamp']>=ttdate)]=inputData[var].bfill().loc[inputData['timestamp']==ttdate]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print('b',inputData[var].loc[~inputData[var].isnull()].iloc[0],inputData[var].loc[inputData['timestamp']==ttdate])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; offsetRef[(inputData['timestamp']<ttdate)]= inputData[var].loc[~inputData[var].isnull()].iloc[0]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; offsetRef[(inputData['timestamp']>=ttdate)]= inputData[var].loc[inputData['timestamp']==ttdate]&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if inputData[var].loc[inputData['timestamp']==ttdate].isna().bool(): #if date is nan&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print('c')&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].bfill().loc[inputData['timestamp']==ttdate]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print('d',inputData[var].loc[inputData['timestamp']==ttdate])&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; offsetRef[ inputData['timestamp'] >=resetDates[tt]] = inputData[var].loc[inputData['timestamp']==ttdate]&nbsp; &nbsp; &nbsp; &nbsp; print(offsetRef)&nbsp; &nbsp; ax.plot(inputData['timestamp'], offsetFactor*(len(timeseries_plots)-ii-1)+inputData[var]-offsetRef)plt.show()这会将所选位置的偏移量“重置”为 20,resetDates以生成下图:在任何一种情况下,我可能都不需要 nan 数据的 if 逻辑捕获(并且只依赖.bfill())来工作 - 但这让我觉得它更安全。我将在改进解决方案时进行编辑。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python