将Pandas列中的字典/列表拆分为单独的列

3回答

繁星淼淼

若要将字符串转换为实际的dict，可以执行以下操作df['Pollutant Levels'].map(eval)..之后，可以使用下面的解决方案将DECT转换为不同的列。使用一个小示例，您可以使用.apply(pd.Series):In [2]: df = pd.DataFrame({'a':[1,2,3], 'b':[{'c':1}, {'d':3}, {'c':5, 'd':6}]})In [3]: dfOut[3]:   a                   b0  1           {u'c': 1}1  2           {u'd': 3}2  3  {u'c': 5, u'd': 6}In [4]: df['b'].apply(pd.Series)Out[4]:     c    d0  1.0  NaN1  NaN  3.02  5.0  6.0要将其与其余的dataframe结合起来，您可以concat具有上述结果的其他列：In [7]: pd.concat([df.drop(['b'], axis=1), df['b'].apply(pd.Series)], axis=1)Out[7]:   a    c    d0  1  1.0  NaN1  2  NaN  3.02  3  5.0  6.0使用您的代码，如果我省略了iloc部分：In [15]: pd.concat([df.drop('b', axis=1), pd.DataFrame(df['b'].tolist())], axis=1)Out[15]:   a    c    d0  1  1.0  NaN1  2  NaN  3.02  3  5.0  6.0

0 0

千万里不及你

试试这个：从SQL返回的数据必须转换为dict。或者可能是"Pollutant Levels"现在Pollutants'   StationID                   Pollutants0       8809  {"a":"46","b":"3","c":"12"}1       8810   {"a":"36","b":"5","c":"8"}2       8811            {"b":"2","c":"7"}3       8812                   {"c":"11"}4       8813          {"a":"82","c":"15"}df2["Pollutants"] = df2["Pollutants"].apply(lambda x : dict(eval(x)) )df3 = df2["Pollutants"].apply(pd.Series )    a    b   c0   46    3  121   36    5   82  NaN    2   73  NaN  NaN  114   82  NaN  15result = pd.concat([df, df3], axis=1).drop('Pollutants', axis=1)result   StationID    a    b   c0       8809   46    3  121       8810   36    5   82       8811  NaN    2   73       8812  NaN  NaN  114       8813   82  NaN  15

0 0

红糖糍粑

我们不需要一个lambda函数。以下两种方法中的任何一种都可以安全地忽略字典的计算，如下所示：方式1：两个步骤# step 1: convert the `Pollutants` column to Pandas dataframe seriesdf_pol_ps = data_df['Pollutants'].apply(pd.Series)df_pol_ps:    a   b   c0   46  3   121   36  5   82   NaN 2   73   NaN NaN 114   82  NaN 15# step 2: concat columns `a, b, c` and drop/remove the `Pollutants` df_final = pd.concat([df, df_pol_ps], axis = 1).drop('Pollutants', axis = 1)df_final:    StationID   a   b   c0   8809    46  3   121   8810    36  5   82   8811    NaN 2   73   8812    NaN NaN 114   8813    82  NaN 15方法2：以上两个步骤可以一次完成：df_final = pd.concat([df, df['Pollutants'].apply(pd.Series)], axis = 1).drop('Pollutants', axis = 1)df_final:    StationID   a   b   c0   8809    46  3   121   8810    36  5   82   8811    NaN 2   73   8812    NaN NaN 114   8813    82  NaN 15

0 0