德玛西亚99
我认为需要:df = pd.DataFrame({ 'A': ['a','a','a','a','b','b','b','c','d']})s = df['A'].value_counts()print (s)a 4b 3d 1c 1Name: A, dtype: int64如果需要总结以下所有值threshold:threshold = 2m = s < threshold#filter values under thresholdout = s[~m]#sum values under and create new values to Seriesout['misc'] = s[m].sum()print (out)a 4b 3misc 2Name: A, dtype: int64但是如果需要rename索引值低于阈值:out = s.rename(dict.fromkeys(s.index[s < threshold], 'misc'))print (out)a 4b 3misc 1misc 1Name: A, dtype: int64如果需要更换原来的柱使用GroupBy.transform具有numpy.where:df['A'] = np.where(df.groupby('A')['A'].transform('size') < threshold, 'misc', df['A'])print (df) A0 a1 a2 a3 a4 b5 b6 b7 misc8 misc