在SparkDataFrame中找到每个组的最大行数
sa
sb
Row
name
, id_sa
id_sb
id_sa
id_sb
id_sa
id_sb
id_sa
.
[Row(name='n1', id_sa='a1', id_sb='b1'), Row(name='n2', id_sa='a1', id_sb='b2'), Row(name='n3', id_sa='a1', id_sb='b2'), Row(name='n4', id_sa='a2', id_sb='b2')]
a1
b2
a1
n1
, n2
n3
b1
, b2
b2
b2
a1
a2
b2
groupBy(df.id_sa)
[Row(id_sa=a1, max_id_sb=b2), Row(id_sa=a2, max_id_sb=b2)]
相关分类