当 Python 字典具有不同的键时，将 Python 字典合并到 Spark 数据帧中

您可以传递字典以创建DataFrame函数。l = [{'a': 1, 'b': 2, 'c': 3}, {'b': 4, 'c': 5, 'd': 6, 'e': 7}]df = spark.createDataFrame(l)#UserWarning: inferring schema from dict is deprecated,please use pyspark.sql.Row instead#warnings.warn("inferring schema from dict is deprecateddf.show()+----+---+---+----+----+|   a|  b|  c|   d|   e|+----+---+---+----+----+|   1|  2|  3|null|null||null|  4|  5|   6|   7|+----+---+---+----+----+此外，还为列提供，因为不推荐使用字典的架构推理。使用对象创建数据框要求所有字典具有相同的列。schemaRow通过合并涉及的所有字典中的键，以编程方式定义架构。from pyspark.sql.types import StructType,StructField,IntegerType#Function to merge keys from several dictsdef merge_keys(*dict_args):    result = set()    for dict_arg in dict_args:        for key in dict_arg.keys():            result.add(key)    return sorted(list(result))#Generate schema given a column listdef generate_schema(columns):    result = StructType()    for column in columns:        result.add(column,IntegerType(),nullable=True) #change type and nullability as needed    return resultdf = spark.createDataFrame(l,schema=generate_schema(merge_keys(*l)))

当 Python 字典具有不同的键时，将 Python 字典合并到 Spark 数据帧中

1回答