如何按列分组并输出由制表符分隔的多列

一种使用melt,groupby和的方法unstack：数据原来的In []: dfOut[]:   Patient Test  panel   gene alteration0        1    A     54    APC     E1345*1        1    B     54   TP53      Y205H2        1    C     54    APC     V2278V3        2    A     54   KRAS       G12D4        2    B     54   PTEN       L25L5        3    A     54   KRAS       G13D6        3    C     54   TP53      C141W7        3    C     54    APC     R876*8        3    A     54  ERBB2      L663P整齐的数据pd.DataFrame.melt 允许整理这张表：In []: tidy = df.melt(id_vars=['Patient', 'Test'], value_vars=['panel', 'gene', 'alteration'])In []: tidyOut[]:    Patient Test    variable   value0         1    A       panel      541         1    B       panel      542         1    C       panel      543         2    A       panel      544         2    B       panel      545         3    A       panel      546         3    C       panel      547         3    C       panel      548         3    A       panel      549         1    A        gene     APC10        1    B        gene    TP5311        1    C        gene     APC12        2    A        gene    KRAS13        2    B        gene    PTEN14        3    A        gene    KRAS15        3    C        gene    TP5316        3    C        gene     APC17        3    A        gene   ERBB218        1    A  alteration  E1345*19        1    B  alteration   Y205H20        1    C  alteration  V2278V21        2    A  alteration    G12D22        2    B  alteration    L25L23        3    A  alteration    G13D24        3    C  alteration   C141W25        3    C  alteration  R876*26        3    A  alteration   L663P重塑使用 goupby 和 unstackIn []: (tidy.groupby(['Patient', 'Test', 'variable'])  # group by three levels of interest     ...:   .first()                                   # access values as a dataframe     ...:   .unstack(level=[1,2]))                     # pivot on levels [1, 2] of multiindexOut[]:              valueTest              A                      B                      Cvariable alteration  gene panel alteration  gene panel alteration  gene panelPatient1            E1345*   APC    54      Y205H  TP53    54     V2278V   APC    542              G12D  KRAS    54       L25L  PTEN    54        NaN   NaN   NaN3              G13D  KRAS    54        NaN   NaN   NaN      C141W  TP53    54使用交叉表这给出了等效的结果：In []: pd.crosstab(tidy.Patient,                # index                   [tidy.Test, tidy.variable],  # columns                   values=tidy.value,                   aggfunc='first')             # get first valueOut[]:Test              A                      B                      Cvariable alteration  gene panel alteration  gene panel alteration  gene panelPatient1            E1345*   APC    54      Y205H  TP53    54     V2278V   APC    542              G12D  KRAS    54       L25L  PTEN    54        NaN   NaN   NaN3              G13D  KRAS    54        NaN   NaN   NaN      C141W  TP53    54

如何按列分组并输出由制表符分隔的多列 - Python

2回答