PySpark：按其他表中的子字符串过滤数据帧

contains()在连接条件中使用withleft_anti作为连接类型。左反连接返回第一个表中与第二个表中不匹配的所有行。df_a.show()+-----+---------+                                                               | word|frequency|+-----+---------+|  git|        5||stack|       10||match|       15||other|        3|+-----+---------+df_b.show()+-------------+-----------+|       word_1|frequency_1|+-------------+-----------+|       github|          5||        match|          2||stackoverflow|         10||      b_entry|          7|+-------------+-----------+from pyspark.sql.functions import *df_a.join(df_b, (df_b.word_1.contains(df_a.word)), "left_anti").show()+-----+---------+| word|frequency|+-----+---------+|other|        3|+-----+---------+

PySpark：按其他表中的子字符串过滤数据帧

1回答