子衿沉夜
创建和系列.str.split 对于新列:DataFrameexpand=Truea = np.array([['1;"Female";133;132;124;"118";"64.5";816932'], ['2;"Male";140;150;124;".";"72.5";1001121'], ['3;"Male";139;123;150;"143";"73.3";1038437'], ['4;"Male";133;129;128;"172";"68.8";965353'], ['5;"Female";137;132;134;"147";"65.0";951545'], ['6;"Female";99;90;110;"146";"69.0";928799'], ['7;"Female";138;136;131;"138";"64.5";991305']], dtype=object)df = pd.DataFrame(a)[0].str.split(';', expand=True)df.columns = ['ID',"Gender","FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]最后一些数据清理 - 由Series.str.strip删除,并通过使用DataFrame.apply to_numeric将列转换为数字:""df['Gender'] = df['Gender'].str.strip('"')c = ["ID", "FSIQ","VIQ","PIQ","Weight","Height","MRI_Count"]df[c] = df[c].apply(lambda x: pd.to_numeric(x.str.strip('"'), errors='coerce'))print (df) ID Gender FSIQ VIQ PIQ Weight Height MRI_Count0 1 Female 133 132 124 118.0 64.5 8169321 2 Male 140 150 124 NaN 72.5 10011212 3 Male 139 123 150 143.0 73.3 10384373 4 Male 133 129 128 172.0 68.8 9653534 5 Female 137 132 134 147.0 65.0 9515455 6 Female 99 90 110 146.0 69.0 9287996 7 Female 138 136 131 138.0 64.5 991305
婷婷同学_
另一个潜在的解决方案是使用io。StringIO 和 pandas.read_csv。只需用一个字符连接数组中的每个元素:\nfrom io import StringIO# Setupa = np.array([['1;"Female";133;132;124;"118";"64.5";816932'], ['2;"Male";140;150;124;".";"72.5";1001121'], ['3;"Male";139;123;150;"143";"73.3";1038437'], ['4;"Male";133;129;128;"172";"68.8";965353'], ['5;"Female";137;132;134;"147";"65.0";951545'], ['6;"Female";99;90;110;"146";"69.0";928799'], ['7;"Female";138;136;131;"138";"64.5";991305']])columns = ["Gender", "FSIQ", "VIQ", "PIQ", "Weight", "Height", "MRI_Count"]df = pd.read_csv(StringIO('\n'.join(a.ravel())), header=None, sep=';', names=columns, na_values=['.'])[输出] Gender FSIQ VIQ PIQ Weight Height MRI_Count1 Female 133 132 124 118.0 64.5 8169322 Male 140 150 124 NaN 72.5 10011213 Male 139 123 150 143.0 73.3 10384374 Male 133 129 128 172.0 68.8 9653535 Female 137 132 134 147.0 65.0 9515456 Female 99 90 110 146.0 69.0 9287997 Female 138 136 131 138.0 64.5 991305pandas应该做得很好解释dtypesprint(df.info())<class 'pandas.core.frame.DataFrame'>Int64Index: 7 entries, 1 to 7Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Gender 7 non-null object 1 FSIQ 7 non-null int64 2 VIQ 7 non-null int64 3 PIQ 7 non-null int64 4 Weight 6 non-null float64 5 Height 7 non-null float64 6 MRI_Count 7 non-null int64 dtypes: float64(2), int64(4), object(1)memory usage: 448.0+ bytes