让我们考虑以下 CSV 文件test.csv:
"x","y","A","B"
8000000000,"0,1","0.113948,0.113689",0.114042
8000000000,"0,1","0.114063,0.113823",0.114175
8000000000,"0,1","0.114405,0.114366",0.114524
8000000000,"0,1,2,3","0.167543,0.172369,0.419197,0.427285",0.427576
8000000000,"0,1,2,3","0.167784,0.172145,0.418624,0.426492",0.428736
8000000000,"0,1,2,3","0.168121,0.172729,0.419768,0.427467",0.428578
我的目标是按列"x"和来对行进行分组,并计算列和"y"的算术平均值。"A""B"
我的第一个方法是在 Pandas 中使用groupby()和 的组合:mean()
import pandas
if __name__ == "__main__":
data = pandas.read_csv("test.csv", header=0)
data = data.groupby(["x", "y"], as_index=False).mean()
print(data)
运行此脚本会产生以下输出:
x y B
0 8000000000 0,1 0.114247
1 8000000000 0,1,2,3 0.428297
正如我们所看到的,实现单值列的目标"B"很简单。然而,该列"A"被省略。相反,我希望该列带有"A"一个字符串,其中包含每个逗号分隔值的算术平均值。所需的输出应如下所示:
x y A B
0 8000000000 0,1 0.114139,0.113959 0.114247
1 8000000000 0,1,2,3 0.167816,0.172414,0.419196,0.427081 0.428297
有人知道怎么做这个吗?
哆啦的时光机
相关分类