对多个数据帧和返回语句进行计算的更好方法?

  • 我的函数查看 3 个数据帧,在不同日期之间进行过滤,并创建一个语句。

  • 正如您所看到的,该函数一遍又一遍地重复使用相同的步骤,我想减少它们。

  • 我相信使用 afor-loop会有所帮助,但我不确定如何return像现在这样在一小段中做出陈述

def stat_generator(df,date1,date2,df2,date3,date4,df4,date5,date6): 

    ##First Date Filter for First Dataframe, and calculations for first dataframe

    

    df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])

    mask = ((df['Announcement Date'] >= date1) & (df['Announcement Date'] <= date2))

    df_new = df.loc[mask]

    total = len(df_new)

    better = df_new[(df_new['performance'] == 'better')]

    better_perc = round(((len(better)/total)*100),2)

    worse = df_new[(df_new['performance'] == 'worse')]

    worse_perc = round(((len(worse)/total)*100),2)

    statement1 = "During the time period between {} and {}, {} % of the students performed better. {} % 

    of the students performed worse" .format(date1,date2,better_perc,worse_perc)

    

    ##Second Date Filter for Second Dataframe, and calculations for second dataframe

    

    df2['Announcement Date'] = pd.to_datetime(df2['Announcement Date'])

    mask2 = ((df2['Announcement Date'] >= date3) & (df2['Announcement Date'] <= date4))

    df_new2 = df2.loc[mask2]

    total2 = len(df_new2)

    better2 = df_new2[(df_new2['performance'] == 'better')]

    better_perc2 = round(((len(better2)/total2)*100),2)

    worse2 = df_new2[(df_new2['performance'] == 'worse')]

    worse_perc2 = round(((len(worse2)/total2)*100),2)

    statement2 = "During the time period between {} and {}, {} % of the students performed better. {} % 

    of the students performed worse" .format(date3,date4,better_perc2,worse_perc2)

    

    ##Third Date Filter for Third Dataframe, and calculations for third dataframe

   

叮当猫咪
浏览 137回答 2
2回答

www说

我只需将 3 个参数传递给您的函数,即 df、date1 和 date2,然后调用您的函数 3 次。def stat_generator(df,date1,date2):&nbsp; &nbsp; "..."&nbsp; &nbsp; return statement然后将您的数据作为列表列表或类似的内容传递。例如:data = [[df,date1,date2],[df2,date3,date4],[df4,date5,date6]]for lists in data:&nbsp; &nbsp; stat_generator(*lists)

尚方宝剑之说

维持现有形式df将中的参数更改stat_generator为df1,因此df可以在 中使用for-loop。将每个数据帧的数据分组在一起创建一个statements列表,待返回date1anddate2改为d1andd2在循环中更新statement1为使用更易于阅读的f-string.我认为这些更新需要对整体代码进行最少的更改。可选:更改mask为mask = df['Announcement Date'].between(d1, d2, inclusive=True)def stat_generator(df1, date1 ,date2 ,df2 ,date3 ,date4 ,df4 ,date5 ,date6):     ##First Date Filter for First Dataframe, and calculations for first dataframe        # create groups    groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]        # create a statements list for each statement    statements = list()        # iterate through each group    for (df, d1, d2) in groups:            df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])        mask = ((df['Announcement Date'] >= d1) & (df['Announcement Date'] <= d2))        df_new = df.loc[mask]        total = len(df_new)        better = df_new[(df_new['performance'] == 'better')]        better_perc = round(((len(better)/total)*100),2)        worse = df_new[(df_new['performance'] == 'worse')]        worse_perc = round(((len(worse)/total)*100),2)        statement1 = f"During the time period between {d1} and {d2}, {better_perc}% of the students performed better. {worse_perc}%  of the students performed worse"                # append the statement of the dataframe        statements.append(statement1)    # return a list of all the statements        return statements完全重写该函数最好只做一件事,即提取并返回数据。负责将多个数据帧传递到函数外部的函数,并将结果收集在一个list或多个数据print帧中。better为和创建新的数据框效率不高worse。使用.value_counts()withnormalize=True来获取百分比。def stat_generator(df: pd.DataFrame, d1: str, d2: str) -> str:                df['Announcement Date'] = pd.to_datetime(df['Announcement Date'])    # create the mask    mask = df['Announcement Date'].between(d1, d2, inclusive=True)    # apply the mask    df_new = df.loc[mask]    # calculate the percentage    per = (df_new.performance.value_counts(normalize=True) * 100).round(2)    return f"During the time period between {d1} and {d2}, {per['better']}% of the students performed better. {per['worse']}%  of the students performed worse"groups = [(df1, date1, date2), (df2, date3, date4), (df3, date5, date6)]statements = list()for group in groups:    statements.append(stat_generator(*group))
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python