猿问

在熊猫数据框中查找邻居

我有一个自行车数据集,其中有商店的列、它们的销售地点以及有关自行车型号的一些信息。我需要比较每个商店中模型的销售数量。为此,我需要执行以下操作:

  • 按商店分组自行车:

    groups = df.groupby('store_id')

  • 然后,对于该商店中的每个模型,我需要找到具有相似特征的模型。即相似的身高、长度、体重等。为此,我设置了 10% 的差异界限,这意味着如果两个模型之间的体重差异小于 10%,则另一个模型是可比较的邻居。

  • 最后,对于每个模型,我想看看它在竞争对手中的排名,如果它的表现优于其中的 50%,就给它贴上“最畅销”的标签。

问题是,我不知道如何执行第 2 步和第 3 步。有人有想法吗?我看过 pandas 文档中的 Groupby.Transform,但我不知道它如何适合整个画面。

非常感谢您的帮助!


ITMISS
浏览 118回答 1
1回答

侃侃无极

试试这个:import pandas as pdimport numpy as npdef sales_rank(x, df):&nbsp; &nbsp; df_ns = df.set_index('id')&nbsp; &nbsp; df_ns = df_ns.loc[x.neighbors, 'sales']&nbsp; &nbsp; df_ns.sort_values(ascending=False, inplace=True)&nbsp; &nbsp; df_ns = df_ns.reset_index()&nbsp; &nbsp; return df_ns[df_ns.id == x.id].index[0]df = pd.DataFrame(data={'id': range(5), 'weight': [20, 21, 23, 43, 22], 'sales':[200, 100, 140, 100, 100]})df['neighbors'] = df.weight.apply(lambda x: df.id[np.isclose(df.weight.values, x, rtol=0.10)].values)df['sales_rank_in_neighborhood'] = df.apply(lambda x: sales_rank(x, df) , axis=1)df['top_seller'] = df.apply(lambda x: x.sales_rank_in_neighborhood < len(x.neighbors)//2, axis=1)print(df)输出&nbsp; &nbsp;id&nbsp; weight&nbsp; sales&nbsp; &nbsp; &nbsp;neighbors&nbsp; sales_rank_in_neighborhood&nbsp; top_seller0&nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; 20&nbsp; &nbsp; 200&nbsp; &nbsp; &nbsp;[0, 1, 4]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; True1&nbsp; &nbsp;1&nbsp; &nbsp; &nbsp; 21&nbsp; &nbsp; 100&nbsp; [0, 1, 2, 4]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3&nbsp; &nbsp; &nbsp; &nbsp;False2&nbsp; &nbsp;2&nbsp; &nbsp; &nbsp; 23&nbsp; &nbsp; 140&nbsp; &nbsp; &nbsp;[1, 2, 4]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp; True3&nbsp; &nbsp;3&nbsp; &nbsp; &nbsp; 43&nbsp; &nbsp; 100&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;[3]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; &nbsp;False4&nbsp; &nbsp;4&nbsp; &nbsp; &nbsp; 22&nbsp; &nbsp; 100&nbsp; [0, 1, 2, 4]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2&nbsp; &nbsp; &nbsp; &nbsp;False请注意,单元素社区中没有畅销商品。调整规则以适合您的目的。我希望这有帮助!编辑我添加了一个组解决方案,多个定义邻域的规则和固定销售排名实现:import pandas as pdimport numpy as npdef ns(x, df):&nbsp; &nbsp; weight_rule = np.isclose(df.weight.values, x.weight, rtol=0.10)&nbsp; &nbsp; gear_rule = df.gear == x.gear&nbsp; &nbsp; type_rule = df.type == x.type&nbsp; &nbsp; return df.id[np.logical_and.reduce((weight_rule, gear_rule, type_rule))].valuesdef sales_rank(x, df):&nbsp; &nbsp; df_ns = df.set_index('id')&nbsp; &nbsp; df_ns = df_ns.loc[x.neighbors, 'sales']&nbsp; &nbsp; df_ns.sort_values(ascending=False, inplace=True)&nbsp; &nbsp; df_ns = df_ns.reset_index()&nbsp; &nbsp; return df_ns[df_ns.id == x.id].index[0]df = pd.DataFrame(data={'store_id': [0, 1, 0, 1, 0], 'id': range(5), 'weight': [20, 21, 23, 43, 22], 'gear': [3, 3, 3, 7, 3], 'type':['mountain', 'mountain', 'mountain', 'bmx', 'mountain'], 'sales':[200, 100, 140, 100, 100]})# Columns for resultsdf['neighbors'] = ''df['sales_rank_in_neighborhood'] = ''df['top_seller'] = ''groups = df.groupby('store_id')for _, g in groups:&nbsp; &nbsp; df_temp = df.loc[g.index, :]&nbsp; &nbsp; df_temp.neighbors = df_temp.apply(lambda x: ns(x, df_temp), axis=1)&nbsp; &nbsp; df_temp.sales_rank_in_neighborhood = df_temp.apply(lambda x: sales_rank(x, df_temp), axis=1)&nbsp; &nbsp; df_temp.top_seller = df_temp.apply(lambda x: x.sales_rank_in_neighborhood < len(x.neighbors)//2, axis=1)&nbsp; &nbsp; df.loc[g.index, :] = df_tempprint(df)输出&nbsp; &nbsp;store_id&nbsp; id&nbsp; weight&nbsp; gear&nbsp; &nbsp; &nbsp; type&nbsp; sales&nbsp; neighbors sales_rank_in_neighborhood top_seller0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp;0&nbsp; &nbsp; &nbsp; 20&nbsp; &nbsp; &nbsp;3&nbsp; mountain&nbsp; &nbsp; 200&nbsp; &nbsp; &nbsp;[0, 4]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp;True1&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;1&nbsp; &nbsp; &nbsp; 21&nbsp; &nbsp; &nbsp;3&nbsp; mountain&nbsp; &nbsp; 100&nbsp; &nbsp; &nbsp; &nbsp; [1]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; False2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp;2&nbsp; &nbsp; &nbsp; 23&nbsp; &nbsp; &nbsp;3&nbsp; mountain&nbsp; &nbsp; 140&nbsp; &nbsp; &nbsp;[2, 4]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; &nbsp;True3&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1&nbsp; &nbsp;3&nbsp; &nbsp; &nbsp; 43&nbsp; &nbsp; &nbsp;7&nbsp; &nbsp; &nbsp; &nbsp;bmx&nbsp; &nbsp; 100&nbsp; &nbsp; &nbsp; &nbsp; [3]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0&nbsp; &nbsp; &nbsp; False4&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0&nbsp; &nbsp;4&nbsp; &nbsp; &nbsp; 22&nbsp; &nbsp; &nbsp;3&nbsp; mountain&nbsp; &nbsp; 100&nbsp; [0, 2, 4]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 2&nbsp; &nbsp; &nbsp; False我想会有一种方法可以避免循环遍历组,但这似乎可以解决问题。
随时随地看视频慕课网APP

相关分类

Python
我要回答