我有一个这样的数据帧
Studentname Speciality
Alex ["Physics","Math","biology"]
Sam ["Economics","History","Math","Physics"]
Claire ["Political science,Physics"]
我想找到所有专攻[物理,数学]的学生,所以输出应该有2行Alex,Sam
这是我尝试过的
from pyspark.sql.functions import array_contains
from pyspark.sql import functions as F
def student_info():
student_df = spark.read.parquet("s3a://studentdata")
a1=["Physics","Math"]
df=student_df
for a in a1:
df= student_df.filter(array_contains(student_df.Speciality, a))
print(df.count())
student_info()
output:
3
2
想知道如何根据给定的数组子集过滤数组列
墨色风雨
MMMHUHU
梵蒂冈之花
相关分类