将数组列表转换为数据帧

你好,我有一个看起来像这样的数据集:

array([['1;"Female";133;132;124;"118";"64.5";816932'],
       ['2;"Male";140;150;124;".";"72.5";1001121'],
       ['3;"Male";139;123;150;"143";"73.3";1038437'],
       ['4;"Male";133;129;128;"172";"68.8";965353'],
       ['5;"Female";137;132;134;"147";"65.0";951545'],
       ['6;"Female";99;90;110;"146";"69.0";928799'],
       ['7;"Female";138;136;131;"138";"64.5";991305']], dtype=object)

我想将 is 转换为具有此列的数据帧

columns = ["Gender";"FSIQ";"VIQ";"PIQ";"Weight";"Height";"MRI_Count"]

注意:从数组列表中,行值的分隔符是一个分号(;)。帮助我将其组织到具有列名和数组中的行值的数据帧


潇潇雨雨
浏览 124回答 2
2回答

子衿沉夜

创建和系列.str.split 对于新列:DataFrameexpand=Truea = np.array([['1;"Female";133;132;124;"118";"64.5";816932'],       ['2;"Male";140;150;124;".";"72.5";1001121'],       ['3;"Male";139;123;150;"143";"73.3";1038437'],       ['4;"Male";133;129;128;"172";"68.8";965353'],       ['5;"Female";137;132;134;"147";"65.0";951545'],       ['6;"Female";99;90;110;"146";"69.0";928799'],       ['7;"Female";138;136;131;"138";"64.5";991305']], dtype=object)df = pd.DataFrame(a)[0].str.split(';', expand=True)df.columns = ['ID',"Gender","FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]最后一些数据清理 - 由Series.str.strip删除,并通过使用DataFrame.apply to_numeric将列转换为数字:""df['Gender'] = df['Gender'].str.strip('"')c = ["ID", "FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]df[c] = df[c].apply(lambda x: pd.to_numeric(x.str.strip('"'), errors='coerce'))print (df)  ID  Gender  FSIQ  VIQ  PIQ  Weight  Height  MRI_Count0  1  Female   133  132  124   118.0    64.5     8169321  2    Male   140  150  124     NaN    72.5    10011212  3    Male   139  123  150   143.0    73.3    10384373  4    Male   133  129  128   172.0    68.8     9653534  5  Female   137  132  134   147.0    65.0     9515455  6  Female    99   90  110   146.0    69.0     9287996  7  Female   138  136  131   138.0    64.5     991305

婷婷同学_

另一个潜在的解决方案是使用io。StringIO&nbsp;和&nbsp;pandas.read_csv。只需用一个字符连接数组中的每个元素:\nfrom io import StringIO# Setupa = np.array([['1;"Female";133;132;124;"118";"64.5";816932'],&nbsp; &nbsp; &nbsp; &nbsp;['2;"Male";140;150;124;".";"72.5";1001121'],&nbsp; &nbsp; &nbsp; &nbsp;['3;"Male";139;123;150;"143";"73.3";1038437'],&nbsp; &nbsp; &nbsp; &nbsp;['4;"Male";133;129;128;"172";"68.8";965353'],&nbsp; &nbsp; &nbsp; &nbsp;['5;"Female";137;132;134;"147";"65.0";951545'],&nbsp; &nbsp; &nbsp; &nbsp;['6;"Female";99;90;110;"146";"69.0";928799'],&nbsp; &nbsp; &nbsp; &nbsp;['7;"Female";138;136;131;"138";"64.5";991305']])columns = ["Gender", "FSIQ", "VIQ", "PIQ", "Weight", "Height", "MRI_Count"]df = pd.read_csv(StringIO('\n'.join(a.ravel())), header=None,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sep=';', names=columns, na_values=['.'])[输出]&nbsp; &nbsp;Gender&nbsp; FSIQ&nbsp; VIQ&nbsp; PIQ&nbsp; Weight&nbsp; Height&nbsp; MRI_Count1&nbsp; Female&nbsp; &nbsp;133&nbsp; 132&nbsp; 124&nbsp; &nbsp;118.0&nbsp; &nbsp; 64.5&nbsp; &nbsp; &nbsp;8169322&nbsp; &nbsp; Male&nbsp; &nbsp;140&nbsp; 150&nbsp; 124&nbsp; &nbsp; &nbsp;NaN&nbsp; &nbsp; 72.5&nbsp; &nbsp; 10011213&nbsp; &nbsp; Male&nbsp; &nbsp;139&nbsp; 123&nbsp; 150&nbsp; &nbsp;143.0&nbsp; &nbsp; 73.3&nbsp; &nbsp; 10384374&nbsp; &nbsp; Male&nbsp; &nbsp;133&nbsp; 129&nbsp; 128&nbsp; &nbsp;172.0&nbsp; &nbsp; 68.8&nbsp; &nbsp; &nbsp;9653535&nbsp; Female&nbsp; &nbsp;137&nbsp; 132&nbsp; 134&nbsp; &nbsp;147.0&nbsp; &nbsp; 65.0&nbsp; &nbsp; &nbsp;9515456&nbsp; Female&nbsp; &nbsp; 99&nbsp; &nbsp;90&nbsp; 110&nbsp; &nbsp;146.0&nbsp; &nbsp; 69.0&nbsp; &nbsp; &nbsp;9287997&nbsp; Female&nbsp; &nbsp;138&nbsp; 136&nbsp; 131&nbsp; &nbsp;138.0&nbsp; &nbsp; 64.5&nbsp; &nbsp; &nbsp;991305pandas应该做得很好解释dtypesprint(df.info())<class 'pandas.core.frame.DataFrame'>Int64Index: 7 entries, 1 to 7Data columns (total 7 columns):&nbsp;#&nbsp; &nbsp;Column&nbsp; &nbsp; &nbsp;Non-Null Count&nbsp; Dtype&nbsp;&nbsp;---&nbsp; ------&nbsp; &nbsp; &nbsp;--------------&nbsp; -----&nbsp;&nbsp;&nbsp;0&nbsp; &nbsp;Gender&nbsp; &nbsp; &nbsp;7 non-null&nbsp; &nbsp; &nbsp; object&nbsp;&nbsp;1&nbsp; &nbsp;FSIQ&nbsp; &nbsp; &nbsp; &nbsp;7 non-null&nbsp; &nbsp; &nbsp; int64&nbsp;&nbsp;&nbsp;2&nbsp; &nbsp;VIQ&nbsp; &nbsp; &nbsp; &nbsp; 7 non-null&nbsp; &nbsp; &nbsp; int64&nbsp;&nbsp;&nbsp;3&nbsp; &nbsp;PIQ&nbsp; &nbsp; &nbsp; &nbsp; 7 non-null&nbsp; &nbsp; &nbsp; int64&nbsp;&nbsp;&nbsp;4&nbsp; &nbsp;Weight&nbsp; &nbsp; &nbsp;6 non-null&nbsp; &nbsp; &nbsp; float64&nbsp;5&nbsp; &nbsp;Height&nbsp; &nbsp; &nbsp;7 non-null&nbsp; &nbsp; &nbsp; float64&nbsp;6&nbsp; &nbsp;MRI_Count&nbsp; 7 non-null&nbsp; &nbsp; &nbsp; int64&nbsp;&nbsp;dtypes: float64(2), int64(4), object(1)memory usage: 448.0+ bytes
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python