猿问

根据先前的行值计算行值,将结果更新为新的行值

下面是一些反映我正在使用的数据的虚拟数据。

import pandas as pd

import numpy as np

from numpy import random


random.seed(30)


# Dummy data that represents a percent change

datelist = pd.date_range(start='1983-01-01', end='1994-01-01', freq='Y')

df1 = pd.DataFrame({"P Change_1": np.random.uniform(low=-0.55528, high=0.0396181, size=(11,)),

                   "P Change_2": np.random.uniform(low=-0.55528, high=0.0396181, size=(11,))})

#This dataframe contains the rows we want to operate on

df2 = pd.DataFrame({

    'Loc1': [None, None, None, None, None, None, None, None, None, None, 2.5415], 

    'Loc2': [None, None, None, None, None, None, None, None, None, None, 3.2126],})


#Set the datetime index

df1 = df1.set_index(datelist)

df2 = df2.set_index(datelist)

df1:

            P Change_1  P Change_2

1984-12-31   -0.172080   -0.231574

1985-12-31   -0.328773   -0.247018

1986-12-31   -0.160834   -0.099079

1987-12-31   -0.457924    0.000266

1988-12-31    0.017374   -0.501916

1989-12-31   -0.349052   -0.438816

1990-12-31    0.034711    0.036164

1991-12-31   -0.415445   -0.415372

1992-12-31   -0.206852   -0.413107

1993-12-31   -0.313341   -0.181030

1994-12-31   -0.474234   -0.118058

df2:

              Loc1    Loc2

1984-12-31     NaN     NaN

1985-12-31     NaN     NaN

1986-12-31     NaN     NaN

1987-12-31     NaN     NaN

1988-12-31     NaN     NaN

1989-12-31     NaN     NaN

1990-12-31     NaN     NaN

1991-12-31     NaN     NaN

1992-12-31     NaN     NaN

1993-12-31     NaN     NaN

1994-12-31  2.5415  3.2126

数据框详细信息:

首先,Loc1 将对应于 P Change_1,Loc2 对应于 P Change_2,等等。首先查看 Loc1,我想用相关值填充包含 Loc1 和 Loc2 的 DataFrame,或者计算一个包含 Calc1 和 Calc2 列的新数据帧.


计算:

我想从 Loc1 的 1994 年值开始,并通过采用 Loc1 1993 = Loc1 1994 + (Loc1 1994 * P Change_1 1993) 计算 1993 年的新值。填充的值将是 2.5415 +(-0.313341 * 2.5415),大约等于 1.74514。


这个 1.74514 值将替换 1993 年的 NaN 值,然后我想使用该计算值来获得 1992 年的值。这意味着我们现在计算 Loc1 1992 = Loc1 1993 + (Loc1 1993 * P Change_1 1992)。我想按行执行此操作,直到它获得时间序列中最早的值。


实现这个逐行方程的最佳方法是什么?我希望这是有道理的,非常感谢任何帮助!


aluckdog
浏览 97回答 3
3回答

呼啦一阵风

df = pd.merge(df1, df2, how='inner', right_index=True, left_index=True)&nbsp; &nbsp;# merging dataframes on date indexdf['count'] = range(len(df))&nbsp; &nbsp; # creating a column, count for easy operation# divides dataframe in two part, one part above the not NaN row and one belowda1 = df[df['count']<=df.dropna().iloc[0]['count']]&nbsp;&nbsp;da2 = df[df['count']>=df.dropna().iloc[0]['count']]da1.sort_values(by=['count'],ascending=False, inplace=True)g=[da1,da2]num_col=len(df1.columns)for w in range(len(g)):&nbsp; &nbsp; list_of_col=[]&nbsp; &nbsp; count = 0&nbsp; &nbsp; list_of_col=[list() for i in range(len(g[w]))]&nbsp; &nbsp; for item, rows in g[w].iterrows():&nbsp; &nbsp; &nbsp; &nbsp; n=[]&nbsp; &nbsp; &nbsp; &nbsp; if count==0:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for p in range(1,num_col+1):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; n.append(rows[f'Loc{p}'])&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for p in range(1,num_col+1):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; n.append(list_of_col[count-1][p-1]+&nbsp; list_of_col[count-1][p-1]* rows[f'P Change_{p}'])&nbsp; &nbsp; &nbsp; &nbsp; list_of_col[count].extend(n)&nbsp; &nbsp; &nbsp; &nbsp; count+=1&nbsp; &nbsp; tmp=[list() for i in range(num_col)]&nbsp; &nbsp; for d_ in range(num_col):&nbsp; &nbsp; &nbsp; &nbsp; for x_ in range(len(list_of_col)):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tmp[d_].append(list_of_col[x_][d_])&nbsp; &nbsp; z1=[]&nbsp; &nbsp; z1.extend(tmp)&nbsp; &nbsp; for i in range(num_col):&nbsp; &nbsp; &nbsp; &nbsp; g[w][f'Loc{i+1}']=z1[i]da1.sort_values(by=['count'] ,inplace=True)final_df = pd.concat([da1, da2[1:]])calc_df = pd.DataFrame()for i in range(num_col):&nbsp; &nbsp; calc_df[f'Calc{i+1}']=final_df[f'Loc{i+1}']print(calc_df)我试图在评论中包括我所做的所有晦涩的事情。我已经编辑了我的代码,让初始数据帧不受影响。[已编辑]:我已编辑代码以在给定数据框中包含任意数量的列。[已编辑:]如果 df1 和 df2 中的列名是任意的,请在运行上层代码之前运行此代码块。我已经使用列表理解重命名了列名!df1.columns = [f'P Change_{i+1}' for i in range(len(df1.columns))]df2.columns = [f'Loc{i+1}' for i in range(len(df2.columns))]

慕仙森

也许有更好/更优雅的方法来做到这一点,但这对我来说效果很好:def fill_values(df1, df2, cols1=None, cols2=None):&nbsp; &nbsp; if cols1 is None: cols1 = df1.columns&nbsp; &nbsp; if cols2 is None: cols2 = df2.columns&nbsp; &nbsp; for i in reversed(range(df2.shape[0]-1)):&nbsp; &nbsp; &nbsp; &nbsp; for col1, col2 in zip(cols1, cols2):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if np.isnan(df2[col2].iloc[i]):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; val = df2[col2].iloc[i+1] + df2[col2].iloc[i+1] * df1[col1].iloc[i]&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; df2[col2].iloc[i] = val&nbsp; &nbsp; return df1, df2df1, df2 = fill_values(df1, df2)print(df2)&nbsp; &nbsp; Loc1&nbsp; &nbsp; Loc21983-12-31&nbsp; 0.140160&nbsp; &nbsp; 0.1363291984-12-31&nbsp; 0.169291&nbsp; &nbsp; 0.1774131985-12-31&nbsp; 0.252212&nbsp; &nbsp; 0.2356141986-12-31&nbsp; 0.300550&nbsp; &nbsp; 0.2615261987-12-31&nbsp; 0.554444&nbsp; &nbsp; 0.2614571988-12-31&nbsp; 0.544976&nbsp; &nbsp; 0.5249251989-12-31&nbsp; 0.837202&nbsp; &nbsp; 0.9353881990-12-31&nbsp; 0.809117&nbsp; &nbsp; 0.9027411991-12-31&nbsp; 1.384158&nbsp; &nbsp; 1.5441281992-12-31&nbsp; 1.745144&nbsp; &nbsp; 2.6310241993-12-31&nbsp; 2.541500&nbsp; &nbsp; 3.212600这假设 df1 和 df2 中的行完全对应(我不是查询索引,而是查询位置)。希望能帮助到你!

饮歌长啸

只是要清楚,你需要的是Loc1[year]=Loc1[next_year] + PChange[year]*Loc1[next_year],对吧?下面的循环将执行您正在寻找的操作,但它只是假设两个 df 中的行数始终相等,等等(而不是匹配索引中的值)。根据您的描述,我认为这适用于您的数据。for&nbsp;i&nbsp;in&nbsp;range(df2.shape[0]-2,-1,-1): &nbsp;&nbsp;&nbsp;&nbsp;df2.Loc1[i]=df2.Loc1[i+1]&nbsp;+&nbsp;(df1.PChange_1[i]*df2.Loc1[i+1])希望这可以帮助 :)
随时随地看视频慕课网APP

相关分类

Python
我要回答