创建一个函数以根据 Pandas 数据帧和标签中的列执行分组和排序

import pandas as pd


import numpy as np


df = pd.DataFrame([

[100,     'm1',   1, 4],

[200,     'm2',   7, 5], 

[120,     'm1',   4, 4],

[240,     'm2',   8, 5],

[300,     'm3',   5, 4],

[330,     'm3',   2, 4],

[350,     'm3',   11, 4],

[200,     'm4',    9, 4]],

columns=['Col1',  'Col2',   'Col3', 'Col4'])

我想根据 Col2 组将数据分为两组。但是,第一个匹配项应分配一个值,其余匹配项应分配一个不同的值。Rahlf 帮我创建了一个函数


def my_function(x, val):


    if x.shape[0]==1:

        if x.iloc[0]>val:

            return 'high'

        else:

            return 'low'


    if x.iloc[0]>val and any(i<=val for i in x.iloc[1:]):

        return 'high'

    elif x.iloc[0]>val:

        return 'med'

    elif x.iloc[0]<=val:

        return 'low'

    else:

        return np.nan

然后做


df['Col5'] = df.sort_values(['Col2','Col1']).groupby('Col2')['Col3'].transform(my_function, (4))

但是,我需要对该函数进行两次修改。而不是 val,它将从 Col 4 中获取相应的值,然后返回一个值(如“low”到组内的第一个匹配项(基于已排序的 col1),然后对其余部分说“low_red”组内比赛。


所以我的问题是如何修改函数来做到这一点?


慕田峪4524236
浏览 164回答 1
1回答

沧海一幻觉

您可以创建一个my_function()由调用的更高级别的函数(我们称之为),transform()然后调用一个较低级别的函数(我们称之为deeper_logic()),该函数应用您的问题中概述的先前逻辑,如下所示:def my_function(group):&nbsp; &nbsp; val = df.iloc[group.index]['Col4']&nbsp; &nbsp; value = deeper_logic(group.iloc[0], val.iloc[0], group)&nbsp; &nbsp; return [value if i==0 else value + '_red' for i in range(group.shape[0])]def deeper_logic(x, val, group):&nbsp; &nbsp; if group.shape[0]==1:&nbsp; &nbsp; &nbsp; &nbsp; if x>val:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return 'high'&nbsp; &nbsp; &nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return 'low'&nbsp; &nbsp; if x>val and any(i<=val for i in group.iloc[1:]):&nbsp; &nbsp; &nbsp; &nbsp; return 'high'&nbsp; &nbsp; elif x>val:&nbsp; &nbsp; &nbsp; &nbsp; return 'med'&nbsp; &nbsp; elif x<=val:&nbsp; &nbsp; &nbsp; &nbsp; return 'low'&nbsp; &nbsp; else:&nbsp; &nbsp; &nbsp; &nbsp; return np.nandf['Col5'] = df.sort_values(['Col2','Col1']).groupby('Col2')['Col3'].transform(my_function)这产生:&nbsp; &nbsp;Col1 Col2&nbsp; Col3&nbsp; Col4&nbsp; &nbsp; &nbsp; Col50&nbsp; &nbsp;100&nbsp; &nbsp;m1&nbsp; &nbsp; &nbsp;1&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; &nbsp; &nbsp;low1&nbsp; &nbsp;200&nbsp; &nbsp;m2&nbsp; &nbsp; &nbsp;7&nbsp; &nbsp; &nbsp;5&nbsp; &nbsp; &nbsp; &nbsp;med2&nbsp; &nbsp;120&nbsp; &nbsp;m1&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp;low_red3&nbsp; &nbsp;240&nbsp; &nbsp;m2&nbsp; &nbsp; &nbsp;8&nbsp; &nbsp; &nbsp;5&nbsp; &nbsp;med_red4&nbsp; &nbsp;300&nbsp; &nbsp;m3&nbsp; &nbsp; &nbsp;5&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; &nbsp; high5&nbsp; &nbsp;330&nbsp; &nbsp;m3&nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp;4&nbsp; high_red6&nbsp; &nbsp;350&nbsp; &nbsp;m3&nbsp; &nbsp; 11&nbsp; &nbsp; &nbsp;4&nbsp; high_red7&nbsp; &nbsp;200&nbsp; &nbsp;m4&nbsp; &nbsp; &nbsp;9&nbsp; &nbsp; &nbsp;4&nbsp; &nbsp; &nbsp; high请注意,transform()对系列进行操作并返回一个类似索引的 NDFrame,这是我们想要的结果(即保留原始数据帧的索引)。因此,我们可以transform()使用我们的Col3列调用,然后在Col4从iloc调用的函数中使用从原始索引中提取相应的列值transform()。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python