江户川乱折腾
您可以通过将简单引号替换为双引号来首先转换为 JSON 字符串,然后使用 from_json将其转换为结构列或映射列。如果您知道该词典的模式,则可以按如下方式操作:data = [ (1, 2, "{'c': 1, 'd': 2}"), (3, 4, "{'c': 7, 'd': 0}"), (5, 6, "{'c': 5, 'd': 4}")]df = spark.createDataFrame(data, ["a", "b", "dic"])schema = StructType([ StructField("c", StringType(), True), StructField("d", StringType(), True)])df = df.withColumn("dic", from_json(regexp_replace(col("dic"), "'", "\""), schema))df.select("a", "b", "dic.*").show(truncate=False)#+---+---+---+---+#|a |b |c |d |#+---+---+---+---+#|1 |2 |1 |2 |#|3 |4 |7 |0 |#|5 |6 |5 |4 |#+---+---+---+---+如果您不知道所有键,则可以将其转换为映射而不是结构,然后将其分解并透视以获取列形式的键:df = df.withColumn("dic", from_json(regexp_replace(col("dic"), "'", "\""), MapType(StringType(), StringType())))\ .select("a", "b", explode("dic"))\ .groupBy("a", "b")\ .pivot("key")\ .agg(first("value"))