在python中计算每组的相同条目

在解决方案的核心最常见的值用于itertools.combinations获取有效的组对在组的组合中比较最频繁的每一行。sum()查找匹配数的真值矩阵休息正在形成df = pd.read_csv(io.StringIO("""group base height weight size0 A 10 5 M0 A 20 5 M1 A 10 10 S2 A 5 5 L"""), sep="\s+")# columns we're working withcols = [c for c in df.columns if c!= "group"]# iterate over combinations of groupsdfx = pd.DataFrame()for gp in itertools.combinations(df.group.unique(), 2): dfg = df.loc[df.group.isin(gp),cols] dfx = pd.concat([dfx, (dfg == dfg.value_counts().index[0]) .sum().to_frame().T.assign(gs=len(dfg), compare=",".join(str(e) for e in gp)) ])# rebase 1 as 0 for comparisonsdfx = dfx.reset_index(drop=True).replace(1,0).astype(str)# format as requireddfx.loc[:,cols] = dfx[cols].apply(lambda x: x+" / " +dfx["gs"])dfx.drop(columns="gs")根据高度重量尺寸比较03 / 32 / 32 / 32 / 30,113 / 30 / 33 / 30 / 30,222 / 20 / 20 / 20 / 21,2

在python中计算每组的相同条目

1回答