例如,这是我的测试数据
test = spark.createDataFrame([
(0, 1, 5, "2018-06-03", "Region A"),
(1, 1, 2, "2018-06-04", "Region B"),
(2, 2, 1, "2018-06-03", "Region B"),
(3, 3, 1, "2018-06-01", "Region A"),
(3, 1, 3, "2018-06-05", "Region A"),
])\
.toDF("orderid", "customerid", "price", "transactiondate", "location")
test.show()
我可以得到这样的汇总数据
test.groupBy("customerid", "location").agg(sum("price")).show()
在此处输入图片说明
但我也想要百分比数据,像这样
+----------+--------+----------+
|customerid|location|sum(price)| percentage
+----------+--------+----------+
| 1|Region B| 2| 20%
| 1|Region A| 8| 80%
| 3|Region A| 1| 100%
| 2|Region B| 1| 100%
+----------+--------+----------+
我想知道
我该怎么做?也许使用窗口功能?
我可以将数据透视表变成这样吗?(带有百分比和总和列)

喵喔喔
天涯尽头无女友
慕雪6442864
随时随地看视频慕课网APP
相关分类