元芳怎么了
首先,您想要分离单元格内的代码,然后您可以提取第一个代码和groupby:# separate the codestmp = df.assign(FirstCode=df.Alpha.str.split(','))# extract the first codetmp['FirstCode'] = [tuple(sorted(set(x.split('-')[0] for x in cell))) for cell in tmp.FirstCode]# sum per each first codes with groupbysum_per_code = tmp['AlphaComboCount'].groupby(tmp['FirstCode']).transform('sum')# percentage is just a simple divisiontmp['Percent'] = tmp['AlphaComboCount']/sum_per_code# let's print the output:print(tmp.sort_values('FirstCode'))输出: Alpha AlphaComboCount FirstCode Percent0 12-99 8039 (12,) 0.91874311 12-581,12-99 711 (12,) 0.0812572 12-99,138-99 1776 (12, 138) 0.5284143 12-45,138-45 1585 (12, 138) 0.4715866 121-99 1102 (121,) 1.00000014 123-49,121-29,22-79 626 (121, 123, 22) 0.99365115 121-99,123-99,22-99 4 (121, 123, 22) 0.0063498 121-99,22-99 909 (121, 22) 1.0000005 123-99 1145 (123,) 1.00000013 2089-281 685 (2089,) 1.0000004 21-99 1225 (21,) 0.5326097 21-581 1000 (21,) 0.43478310 21-141 75 (21,) 0.0326091 22-99 1792 (22,) 1.0000009 32-99 814 (32,) 1.00000012 347-99 685 (347,) 1.000000
森林海
如果Alpha列中有多个代码,顺序不同,那么可能的解决方案之一是提取其中之一(例如最小),然后取出“-”之前的部分,将其保存在新列中并在进一步中使用加工:df['Alpha_1'] = df.Alpha.str.split(',')\ .apply(lambda lst: min(lst)).str.split('-', expand=True)[0]结果是: Alpha AlphaComboCount Alpha_10 12-99 8039 121 22-99 1792 222 12-99,138-99 1776 123 12-45,138-45 1585 124 21-99 1225 215 123-99 1145 1236 121-99 1102 1217 21-581 1000 218 121-99,22-99 909 1219 32-99 814 3210 21-141 75 2111 12-581,12-99 711 1212 347-99 685 34713 2089-281 685 208914 123-49,121-29,22-79 626 12115 121-99,123-99,22-99 4 121要计算每个组中AlphaComboCount的百分比(具有特定值Alpha_1),请定义以下函数:def proc(grp): return (grp.AlphaComboCount / grp.AlphaComboCount.sum() * 100).apply('{0:.2f}%'.format)按Alpha_1对df进行分组并应用此函数,将结果保存在Grp_pct列中:df['Grp_pct'] = df.groupby('Alpha_1').apply(proc).reset_index(level=0, drop=True)要轻松检查结果,请将每组中的行放在一起,按以下方式打印df :print(df.sort_values('Alpha_1'))得到: Alpha AlphaComboCount Alpha_1 Grp_pct0 12-99 8039 12 66.38%2 12-99,138-99 1776 12 14.66%3 12-45,138-45 1585 12 13.09%11 12-581,12-99 711 12 5.87%6 121-99 1102 121 41.73%8 121-99,22-99 909 121 34.42%14 123-49,121-29,22-79 626 121 23.70%15 121-99,123-99,22-99 4 121 0.15%5 123-99 1145 123 100.00%13 2089-281 685 2089 100.00%4 21-99 1225 21 53.26%7 21-581 1000 21 43.48%10 21-141 75 21 3.26%1 22-99 1792 22 100.00%9 32-99 814 32 100.00%12 347-99 685 347 100.00%现在,例如,将有关Alpha_1 == 21 的部分与子代码21的预期结果进行比较。