梦里花落0921
另一种解决方案,使用pivot_table()方法:In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)Out[5]:Status FAILED SUCCESSEventRun 0 2Walk 1 1针对700K DF的计时:In [74]: df.shapeOut[74]: (700000, 2)In [75]: # (c) MerlinIn [76]: %%timeit ....: pd.crosstab(df.Event, df.Status) ....:1 loop, best of 3: 333 ms per loopIn [77]: # (c) piRSquaredIn [78]: %%timeit ....: df.groupby('Event').Status.value_counts().unstack().fillna(0) ....:1 loop, best of 3: 325 ms per loopIn [79]: # (c) MaxUIn [80]: %%timeit ....: df.pivot_table(index='Event', columns='Status', ....: aggfunc=len, fill_value=0) ....:1 loop, best of 3: 367 ms per loopIn [81]: # (c) ayhanIn [82]: %%timeit ....: (df.assign(ones = np.ones(len(df))) ....: .pivot_table(index='Event', columns='Status', ....: aggfunc=np.sum, values = 'ones') ....: ) ....:1 loop, best of 3: 264 ms per loopIn [83]: # (c) DivakarIn [84]: %%timeit ....: unq1,ID1 = np.unique(df['Event'],return_inverse=True) ....: unq2,ID2 = np.unique(df['Status'],return_inverse=True) ....: # Get linear indices/tags corresponding to grouped headers ....: tag = ID1*(ID2.max()+1) + ID2 ....: # Setup 2D Numpy array equivalent of expected Dataframe ....: out = np.zeros((len(unq1),len(unq2)),dtype=int) ....: unqID, count = np.unique(tag,return_counts=True) ....: np.put(out,unqID,count) ....: # Finally convert to Dataframe ....: df_out = pd.DataFrame(out,columns=unq2) ....: df_out.index = unq1 ....:1 loop, best of 3: 2.25 s per loop结论:@ ayhan的解决方案目前胜出:(df.assign(ones = np.ones(len(df))) .pivot_table(index='Event', columns='Status', values = 'ones', aggfunc=np.sum, fill_value=0))