Python:如果有平局,则从 nsmallest 获取第二个 idxmax

我有DataFrame如下。我想要得到的是 mininim 1st, 2nd, ... , n Valuefor eachArticle并且知道Name每个的Value来源。


df

   Article  Name  Value

0     A_01  P_01    360

1     A_03  P_01    625

2     A_01  P_07    360

3     A_01  P_09    370

4     A_02  P_09    847

5     A_03  P_09    685

6     A_03  P_18    650

7     A_02  P_22    935

8     A_03  P_22    625

9     A_02  P_25    750

10    A_03  P_25    600

11    A_01  P_26    500

12    A_02  P_26    750

13    A_03  P_26    600

14    A_01  P_33    480

15    A_03  P_33    750

我正在使用此代码n minimum value查找n minimum name每个Article. 首先,我转动我df的以获得:


list_articles = df['Article'].drop_duplicates()

list_names = list(df['Name'].drop_duplicates())


pivot_df = df.pivot(index='Article', columns='Name', values='Value').reset_index()

pivot_df

Name Article   P_01   P_07   P_09   P_18   P_22   P_25   P_26   P_33

0       A_01  360.0  360.0  370.0    NaN    NaN    NaN  500.0  480.0

1       A_02    NaN    NaN  847.0    NaN  935.0  750.0  750.0    NaN

2       A_03  625.0    NaN  685.0  650.0  625.0  600.0  600.0  750.0

然后我运行了 lambda 函数来查找minimum_value和对应minimum_name:


for i in range(1, 4):  # minimum 3

    pivot_df[f'Min_{i}_Value'] = pivot_df[list_names].T.apply(lambda x: x.nsmallest(i).max())

    pivot_df[f'Min_{i}_Name'] = pivot_df[list_names].T.apply(lambda x: x.nsmallest(i).idxmax())

这给了我另外 6 列:


pivot_df

Name Article   P_01   P_07  ...  Min_2_Name  Min_3_Value  Min_3_Name

0       A_01  360.0  360.0  ...        P_01        370.0        P_09

1       A_02    NaN    NaN  ...        P_25        847.0        P_09

2       A_03  625.0    NaN  ...        P_25        625.0        P_01


最后,我的问题是什么?如果你仔细观察,你会发现Min_1_ValueandMin_2_Value是一样的(这是正确的),但是Min_1_Name和Min_2_Name也是一样的,这是不正确的。为什么?因为在原始数据中,同一篇文章有两个具有相同值的名称,所以这是平局。我的代码正在通过 min_n_value 的索引查找 min_n 的名称,因此如果匹配超过 1 个,则不考虑平局的可能性。但是如何Min_2_Name正确分配不是的Min_1_Name呢?可以按字母顺序选择,没关系。你有什么主意吗?


饮歌长啸
浏览 96回答 2
2回答

Smart猫小萌

我希望您的解决方案应该简化 - 首先DataFrame.sort_values是 2 列:df = df.sort_values(['Article','Value'])print (df)&nbsp; &nbsp;Article&nbsp; Name&nbsp; Value0&nbsp; &nbsp; &nbsp;A_01&nbsp; P_01&nbsp; &nbsp; 3602&nbsp; &nbsp; &nbsp;A_01&nbsp; P_07&nbsp; &nbsp; 3603&nbsp; &nbsp; &nbsp;A_01&nbsp; P_09&nbsp; &nbsp; 37014&nbsp; &nbsp; A_01&nbsp; P_33&nbsp; &nbsp; 48011&nbsp; &nbsp; A_01&nbsp; P_26&nbsp; &nbsp; 5009&nbsp; &nbsp; &nbsp;A_02&nbsp; P_25&nbsp; &nbsp; 75012&nbsp; &nbsp; A_02&nbsp; P_26&nbsp; &nbsp; 7504&nbsp; &nbsp; &nbsp;A_02&nbsp; P_09&nbsp; &nbsp; 8477&nbsp; &nbsp; &nbsp;A_02&nbsp; P_22&nbsp; &nbsp; 93510&nbsp; &nbsp; A_03&nbsp; P_25&nbsp; &nbsp; 60013&nbsp; &nbsp; A_03&nbsp; P_26&nbsp; &nbsp; 6001&nbsp; &nbsp; &nbsp;A_03&nbsp; P_01&nbsp; &nbsp; 6258&nbsp; &nbsp; &nbsp;A_03&nbsp; P_22&nbsp; &nbsp; 6256&nbsp; &nbsp; &nbsp;A_03&nbsp; P_18&nbsp; &nbsp; 6505&nbsp; &nbsp; &nbsp;A_03&nbsp; P_09&nbsp; &nbsp; 68515&nbsp; &nbsp; A_03&nbsp; P_33&nbsp; &nbsp; 750然后创建计数器 Series byGroupBy.cumcount并通过 过滤 top3 值boolean indexing,添加MultiIndex并重塑 by Series.unstack,最后MultiIndex按 s 在列中展平f-string:g = df.groupby('Article').cumcount().add(1)mask = g < 4df = df[mask].set_index(['Article',g[mask]]).unstack().sort_index(axis=1, level=1)df.columns = df.columns.map(lambda x: f'Min_{x[1]}_{x[0]}')df = df.reset_index()print (df)&nbsp; Article Min_1_Name&nbsp; Min_1_Value Min_2_Name&nbsp; Min_2_Value Min_3_Name&nbsp; \0&nbsp; &nbsp; A_01&nbsp; &nbsp; &nbsp; &nbsp;P_01&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 360&nbsp; &nbsp; &nbsp; &nbsp;P_07&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 360&nbsp; &nbsp; &nbsp; &nbsp;P_09&nbsp; &nbsp;1&nbsp; &nbsp; A_02&nbsp; &nbsp; &nbsp; &nbsp;P_25&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 750&nbsp; &nbsp; &nbsp; &nbsp;P_26&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 750&nbsp; &nbsp; &nbsp; &nbsp;P_09&nbsp; &nbsp;2&nbsp; &nbsp; A_03&nbsp; &nbsp; &nbsp; &nbsp;P_25&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 600&nbsp; &nbsp; &nbsp; &nbsp;P_26&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 600&nbsp; &nbsp; &nbsp; &nbsp;P_01&nbsp; &nbsp;&nbsp; &nbsp;Min_3_Value&nbsp;&nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 370&nbsp;&nbsp;1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 847&nbsp;&nbsp;2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 625&nbsp;&nbsp;

慕莱坞森

您可以通过用 NaN 替换以前的最小来实现它。import pandas as pdimport numpy as npdf = pd.DataFrame(&nbsp; &nbsp; &nbsp; [['A_01', 'P_01', 360],&nbsp; &nbsp; &nbsp; &nbsp;['A_03', 'P_01', 625],&nbsp; &nbsp; &nbsp; &nbsp;['A_01', 'P_07', 360],&nbsp; &nbsp; &nbsp; &nbsp;['A_01', 'P_09', 370],&nbsp; &nbsp; &nbsp; &nbsp;['A_02', 'P_09', 847],&nbsp; &nbsp; &nbsp; &nbsp;['A_03', 'P_09', 685],&nbsp; &nbsp; &nbsp; &nbsp;['A_03', 'P_18', 650],&nbsp; &nbsp; &nbsp; &nbsp;['A_02', 'P_22', 935],&nbsp; &nbsp; &nbsp; &nbsp;['A_03', 'P_22', 625],&nbsp; &nbsp; &nbsp; &nbsp;['A_02', 'P_25', 750],&nbsp; &nbsp; &nbsp; &nbsp;['A_03', 'P_25', 600],&nbsp; &nbsp; &nbsp; &nbsp;['A_01', 'P_26', 500],&nbsp; &nbsp; &nbsp; &nbsp;['A_02', 'P_26', 750],&nbsp; &nbsp; &nbsp; &nbsp;['A_03', 'P_26', 600],&nbsp; &nbsp; &nbsp; &nbsp;['A_01', 'P_33', 480],&nbsp; &nbsp; &nbsp; &nbsp;['A_03', 'P_33', 750]])df.columns=['Article','Name','Value']list_articles = df['Article'].drop_duplicates()list_names = list(df['Name'].drop_duplicates())pivot_df = df.pivot(index='Article', columns='Name', values='Value').reset_index()for i in range(1, 4):&nbsp; &nbsp; pivot_df[f'Min_{i}_Value'] = pivot_df[list_names].T.apply(lambda x: x.nsmallest(1).max())&nbsp; &nbsp; indices=pivot_df[list_names].T.apply(lambda y: y.nsmallest(1).idxmax())&nbsp; &nbsp; pivot_df[f'Min_{i}_Name'] = indices&nbsp; &nbsp; for i,x in enumerate(indices):&nbsp; &nbsp; &nbsp; &nbsp; pivot_df[x][i]=np.nanColsToKeep = [x for x in pivot_df.columns.tolist() if x not in list_names]ColsToKeep = [x for x in ColsToKeep if x[:3] == 'Min']ColsToKeep.sort()ColsToKeep = ['Article'] + ColsToKeepfinal_df = pivot_df[ColsToKeep]final_df
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python