pd.get_dummies(df[col])
.
chunks = (len(df) / 10000) + 1df_list = np.array_split(df, chunks)
pd.get_dummies(df)
df[col]
df
df_list
.
for i, df_chunk in enumerate(df_list): print "chunk", i [x, y] = preprocess_data(df_chunk) super_x = pd.concat([super_x, x], axis=0) super_y = pd.concat([super_y, y], axis=0) print datetime.datetime.utcnow()
preprocess_data(df_chunk)
pd.concat()
?
chunks 6chunk 02016-04-08 00:22:17.728849chunk 12016-04-08 00:22:42.387693 chunk 22016-04-08 00:23:43.124381chunk 32016-04-08 00:25:30.249369chunk 42016-04-08 00:28:11.922305chunk 52016-04-08 00:32:00.357365
慕神8447489
相关分类