给定一个数据帧:
+---+-----------+---------+-------+------------+
| id| score|tx_amount|isValid| greeting|
+---+-----------+---------+-------+------------+
| 1| 0.2| 23.78| true| hello_world|
| 2| 0.6| 12.41| false|byebye_world|
+---+-----------+---------+-------+------------+
我想将这些列分解为名为“col_value”的行。这部分很好,但我也想将逻辑应用于每一行,以便我得到如下结果:
+---+------------+--------+---------+----------+-------+
| id| col_value|is_score|is_amount|is_boolean|is_text|
+---+------------+--------+---------+----------+-------+
| 1| 0.2| Y| N| N| N|
| 1| 23.78| N| Y| N| N|
| 1| true| N| N| Y| N|
| 1| hello_world| N| N| N| Y|
| 2| 0.6| Y| N| N| N|
| 2| 12.41| N| Y| N| N|
| 2| false| N| N| Y| N|
| 2|byebye_world| N| N| N| Y|
+---+------------+--------+---------+----------+-------+
到目前为止,我有什么:
.withColumn("cols", F.explode(F.arrays_zip(F.array("score", "tx_amount", "isValid", "greeting")))) \
.select("id", F.col("cols.*")) \
.withColumnRenamed("0", "col_value") \
.withColumn("is_score", F.lit("Y") if col1_type == "score" else F.lit("N")) \
.withColumn("is_amount", F.lit("Y") if col2_type == "amount" else F.lit("N")) \
.withColumn("is_boolean", F.lit("Y") if col3_type == "boolean" else F.lit("N")) \
.withColumn("is_text", F.lit("Y") if col4_type == "text" else F.lit("N")) \
.show()
如何在爆炸后执行此操作以获得正确的结果?
偶然的你
相关分类