在Pandas中使用groupby来比较一列中的内容

另一种解决方案，使用pivot_table（）方法：In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)Out[5]:Status  FAILED  SUCCESSEventRun          0        2Walk         1        1针对700K DF的计时：In [74]: df.shapeOut[74]: (700000, 2)In [75]: # (c) MerlinIn [76]: %%timeit   ....: pd.crosstab(df.Event, df.Status)   ....:1 loop, best of 3: 333 ms per loopIn [77]: # (c) piRSquaredIn [78]: %%timeit   ....: df.groupby('Event').Status.value_counts().unstack().fillna(0)   ....:1 loop, best of 3: 325 ms per loopIn [79]: # (c) MaxUIn [80]: %%timeit   ....: df.pivot_table(index='Event', columns='Status',   ....:                aggfunc=len, fill_value=0)   ....:1 loop, best of 3: 367 ms per loopIn [81]: # (c) ayhanIn [82]: %%timeit   ....: (df.assign(ones = np.ones(len(df)))   ....:    .pivot_table(index='Event', columns='Status',   ....:                 aggfunc=np.sum, values = 'ones')   ....: )   ....:1 loop, best of 3: 264 ms per loopIn [83]: # (c) DivakarIn [84]: %%timeit   ....: unq1,ID1 = np.unique(df['Event'],return_inverse=True)   ....: unq2,ID2 = np.unique(df['Status'],return_inverse=True)   ....: # Get linear indices/tags corresponding to grouped headers   ....: tag = ID1*(ID2.max()+1) + ID2   ....: # Setup 2D Numpy array equivalent of expected Dataframe   ....: out = np.zeros((len(unq1),len(unq2)),dtype=int)   ....: unqID, count = np.unique(tag,return_counts=True)   ....: np.put(out,unqID,count)   ....: # Finally convert to Dataframe   ....: df_out = pd.DataFrame(out,columns=unq2)   ....: df_out.index = unq1   ....:1 loop, best of 3: 2.25 s per loop结论：@ ayhan的解决方案目前胜出：(df.assign(ones = np.ones(len(df)))   .pivot_table(index='Event', columns='Status', values = 'ones',                aggfunc=np.sum, fill_value=0))

在Pandas中使用groupby来比较一列中的内容

3回答