基于两个条件对数据集进行子集化,将每个数据帧保存到 .csv 文件中,迭代每个文件并绘制图形

我是数据科学新手,需要帮助执行以下操作:


region(I) 在我的例子中,根据列中的唯一组和另一个组分割数据集country


(II) 我想将每个数据帧保存为 .csv 文件 - 像这样regionname_country.csv,例如west_GER.csv,east_POL.csv


(III) 如果可能的话,我想迭代每个 .csv 文件以绘制每个 dffor loop 的散点图。education vs age


(IV) 最后将我的绘图/图形保存在 pdf 文件中(每页 4 个图形)


'df'

   Region, country, Age, Education, Income, FICO, Target

1   west, GER, 43, 1, 47510, 710, 1

2   east, POL, 32, 2, 73640, 723, 1

3   east, POL, 22, 2, 88525, 610, 0

4   west, GER, 55, 0, 31008, 592, 0

5   north, USA, 19, 0, 18007, 599, 1

6   south, PER, 27, 2, 68850, 690, 0

7   south, BRZ, 56, 3, 71065, 592, 0

8   north, USA, 39, 1, 98004, 729, 1

9   east, JPN, 36, 2, 51361, 692, 0

10  west, ESP, 59, 1, 98643, 729, 1


期望的结果:


 # df_to_csv : 'west_GER.csv'

west, GER, 43, 1, 47510, 710, 1 

west, GER, 55, 0, 31008, 592, 0 

# west_ESP.csv

west, ESP, 59, 1, 98643, 729, 1 

# east_POL.csv

east, POL, 32, 2, 73640, 723, 1 


.

.

.


# north_USA.csv

north, USA, 39, 1, 98004, 729, 1  

north, USA, 19, 0, 18007, 599, 1


www说
浏览 112回答 2
2回答

呼如林

对于 Python:(一)和(二):for i in df.groupby(["Region", "country"])[["Region", "country"]].apply(lambda x: list(np.unique(x))):    df.groupby(["Region", "country"]).get_group((i[1], i[0])).to_csv(f"{i[1]}_{i[0]}.csv")(三)、(四):import globimport matplotlib.pyplot as pltfig, axs = plt.subplots(nrows=2, ncols=2)for ax, file in zip(axs.flatten(), glob.glob("./*csv")):    df_temp = pd.read_csv(file)    region_temp = df_temp['Region'][0]    country_temp = df_temp['country'][0]        ax.scatter(df_temp["Age"], df_temp["Education"])    ax.set_title(f"Region:{region_temp}, Country:{country_temp}")    ax.set_xlabel("Age")    ax.set_ylabel("Education")    plt.tight_layout()fig.savefig("scatter.pdf")

慕侠2389804

在 R 中,您可以这样做:library(tidyverse)#get data in list of dataframesdf %>%&nbsp; select(Region, country, Education, Age) %>%&nbsp; group_split(Region, country) -> split_data#From list of data create list of plots.&nbsp;list_plots <- map(split_data, ~ggplot(.) + aes(Education, Age) +&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; geom_point() +&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ggtitle(sprintf('Plot for region %s and country %s',&nbsp;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;first(.$Region), first(.$country))))#Write the plots in pdf as well as write the csvs.pdf("plots.pdf", onefile = TRUE)for (i in seq_along(list_plots)) {&nbsp; write.csv(split_data, sprintf('%s_%s.csv',&nbsp;&nbsp; &nbsp; &nbsp; split_data[[i]]$Region[1], split_data[[i]]$country[1]), row.names = FALSE)&nbsp; print(list_plots[[i]])&nbsp;}dev.off()&nbsp;&nbsp;
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python