猿问

比较组组内的 ID - 我的分组或循环有什么问题?

我想比较数据集中的时间间隔(5 秒)并将条件格式应用于另一组表,以便将 IDs 分类为组大小。我的代码如下所示:


mindist = 100

maxdist = 200


grouped1 = result1.groupby(pd.TimeGrouper(key='date', freq="5S"))


group = []


for i, groups1 in grouped1:

    for g, tables1 in groups1.groupby('table'):


        for d1,d2 in zip(tables1.nnDist1,tables1.nnDist2):

            if tables1.id.nunique() > 2 and d1 < mindist and d2 < maxdist:

                group.append(2)

            elif tables1.id.nunique() == 2 and d1 < mindist and d2 > maxdist:

                group.append(1)

            elif tables1.id.nunique() < 2:

                group.append(0)

            else:

                group.append(9)


result1['gs_pred'] = group

它基本上做我想做的。将数据集分成5秒的时间束,进一步分组成表,比较每张表是否有超过1个唯一ID,并根据距离,将它们分为0=单独,1=成对,2=组.


我的问题是,有时即使满足条件,它也会错误地对我的数据进行分类。


例如组“表 5”:


id      nnDist1             nnDist2             table   zone    pred_gs   date

3479.0  55.06369039574004   68.07613385026653   Table5  Zone2   0         2019-10-09 15:30:41.431

3477.0  55.06369039574004   99.14655818534048   Table5  Zone2   0         2019-10-09 15:30:41.431

3476.0  38.02749005658375   80.28754573408989   Table5  Zone2   2         2019-10-09 15:30:41.431

3473.0  38.02749005658375   68.07613385026653   Table5  Zone2   2         2019-10-09 15:30:41.431

3473.0  38.07413820430603   70.09457896299827   Table5  Zone2   2         2019-10-09 15:30:43.831

3479.0  53.91660226686884   70.09457896299827   Table5  Zone2   2         2019-10-09 15:30:43.831

3477.0  53.91660226686884   100.09240730444223  Table5  Zone2   2         2019-10-09 15:30:43.831

3476.0  38.07413820430603   80.2626314046803    Table5  Zone2   2         2019-10-09 15:30:43.831

即使 ID 3479 和 3477 与其他 ID 在同一组中,并且距离低于阈值,为什么它会将这些 ID 分类为 0=独坐?


我希望有人有想法吗?我的分组在这里做错了什么?非常感谢您的帮助!


临摹微笑
浏览 67回答 1
1回答

倚天杖

groupby避免在运行时出现循环解决方案,sort=True但您可以按原始数据帧的顺序进行分配。此外,您正在迭代地检查标量值,而不是通过向量化的系列。相反,请考虑groupby().apply()使用np.selectornp.where有条件地分配gs_pred列的方法。使用这种方法,您可以保持所有组和基础值不变:def calc_groups(g):&nbsp; &nbsp; # LOGICAL CONDITIONS&nbsp; &nbsp; cond1 = (g['id'].nunique() >&nbsp; 2) & (g['nnDist1'] < mindist) & (g['nnDist2'] < maxdist)&nbsp; &nbsp; cond2 = (g['id'].nunique() == 2) & (g['nnDist1'] < mindist) & (g['nnDist2'] > maxdist)&nbsp; &nbsp; cond3 = (g['id'].nunique() <&nbsp; 2)&nbsp; &nbsp; # NUMPY SELECT APPROACH&nbsp; &nbsp; g['gs_pred2'] = np.select([cond1, cond2, cond3], [2, 1, 0], default=9)&nbsp; &nbsp; # NUMPY WHERE APPROACH&nbsp;&nbsp;&nbsp; &nbsp; g['gs_pred3'] = np.where(cond1, 2,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;np.where(cond2, 1,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; np.where(cond3, 0, 9)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; )&nbsp; &nbsp; return g# RUN ASSIGNMENT BY GROUP(S)result1 = (result1.groupby([pd.Grouper(key='date', freq="5S"), 'table'], as_index=False)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; .apply(calc_groups))
随时随地看视频慕课网APP

相关分类

Python
我要回答