PySpark - 将列分解为行并根据逻辑设置值

给定一个数据帧:


+---+-----------+---------+-------+------------+

| id|      score|tx_amount|isValid|    greeting|

+---+-----------+---------+-------+------------+

|  1|        0.2|    23.78|   true| hello_world|

|  2|        0.6|    12.41|  false|byebye_world|

+---+-----------+---------+-------+------------+

我想将这些列分解为名为“col_value”的行。这部分很好,但我也想将逻辑应用于每一行,以便我得到如下结果:


+---+------------+--------+---------+----------+-------+

| id|   col_value|is_score|is_amount|is_boolean|is_text|

+---+------------+--------+---------+----------+-------+

|  1|         0.2|       Y|        N|         N|      N|

|  1|       23.78|       N|        Y|         N|      N|

|  1|        true|       N|        N|         Y|      N|

|  1| hello_world|       N|        N|         N|      Y|

|  2|         0.6|       Y|        N|         N|      N|

|  2|       12.41|       N|        Y|         N|      N|

|  2|       false|       N|        N|         Y|      N|

|  2|byebye_world|       N|        N|         N|      Y|

+---+------------+--------+---------+----------+-------+

到目前为止,我有什么:


.withColumn("cols", F.explode(F.arrays_zip(F.array("score", "tx_amount", "isValid", "greeting")))) \

        .select("id", F.col("cols.*")) \

        .withColumnRenamed("0", "col_value") \

        .withColumn("is_score", F.lit("Y") if col1_type == "score" else F.lit("N")) \

        .withColumn("is_amount", F.lit("Y") if col2_type == "amount" else F.lit("N")) \

        .withColumn("is_boolean", F.lit("Y") if col3_type == "boolean" else F.lit("N")) \

        .withColumn("is_text", F.lit("Y") if col4_type == "text" else F.lit("N")) \

        .show()


如何在爆炸后执行此操作以获得正确的结果?


MMTTMM
浏览 97回答 1
1回答

偶然的你

我认为你想要的可以通过在你的应用程序上来实现,以确定它是否是.只要不超过 1.0,并且始终高于 1.0,下面的代码就可以工作。如果不是这种情况,请告诉我我将更新逻辑。regexcol_valuetext,boolean,amount or scorescoreamountfrom pyspark.sql import functions as Fdf.withColumn("cols", F.explode(F.arrays_zip(F.array("score", "tx_amount", "isValid", "greeting")))) \&nbsp; &nbsp; &nbsp; &nbsp; .select("id", F.col("cols.*")) \&nbsp; &nbsp; &nbsp; &nbsp; .withColumnRenamed("0", "col_value")\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("text", (F.regexp_extract(F.col("col_value"),"([A-Za-z]+)",1)))\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("boolean", F.when((F.col("text")=='true')|(F.col("text")=='false'),F.col("text")).otherwise(F.lit("")))\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("text", F.when(F.col("text")==F.col("boolean"), F.lit("")).otherwise(F.col("text")))\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("numeric", F.regexp_extract(F.col("col_value"),"([0-9]+)",1))\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("is_text", F.when(F.col("text")!="", F.lit("Y")).otherwise(F.lit("N")))\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("is_score", F.when(F.col("numeric")<=1, F.lit("Y")).otherwise(F.lit("N")))\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("is_amount", F.when(F.col("numeric")>1, F.lit("Y")).otherwise(F.lit("N")))\&nbsp; &nbsp; &nbsp; &nbsp; .withColumn("is_boolean", F.when(F.col("boolean")!="", F.lit("Y")).otherwise(F.lit("N")))\&nbsp; &nbsp; &nbsp; &nbsp; .select("id", "col_value","is_score","is_amount","is_boolean","is_text").show()+---+------------+--------+---------+----------+-------+| id|&nbsp; &nbsp;col_value|is_score|is_amount|is_boolean|is_text|+---+------------+--------+---------+----------+-------+|&nbsp; 1|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2|&nbsp; &nbsp; &nbsp; &nbsp;Y|&nbsp; &nbsp; &nbsp; &nbsp; N|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; N||&nbsp; 1|&nbsp; &nbsp; &nbsp; &nbsp;23.78|&nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; &nbsp; Y|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; N||&nbsp; 1|&nbsp; &nbsp; &nbsp; &nbsp; true|&nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; &nbsp; N|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Y|&nbsp; &nbsp; &nbsp; N||&nbsp; 1| hello_world|&nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; &nbsp; N|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; Y||&nbsp; 2|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.6|&nbsp; &nbsp; &nbsp; &nbsp;Y|&nbsp; &nbsp; &nbsp; &nbsp; N|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; N||&nbsp; 2|&nbsp; &nbsp; &nbsp; &nbsp;12.41|&nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; &nbsp; Y|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; N||&nbsp; 2|&nbsp; &nbsp; &nbsp; &nbsp;false|&nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; &nbsp; N|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Y|&nbsp; &nbsp; &nbsp; N||&nbsp; 2|byebye_world|&nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; &nbsp; N|&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;N|&nbsp; &nbsp; &nbsp; Y|+---+------------+--------+---------+----------+-------+
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python