将熊猫数据框转换为嵌套的 json

我有一个如下所示的数据框,其中一列包含已经嵌套的字典列表:


import pandas as pd


data = {'First':  ['First value', 'Second value'],

    'Second': ['First value', 'Second value'],

    'third': ['First value', 'Second value'],

    'forth': ['[{"values": "","entity": "datetime","","Turn":  [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],

    }


df = pd.DataFrame (data, columns = ['First','second','third','forth'])

我想将其转换为以下 json 格式并保存为:


[

  {

    "first": "",

    "second": "",

    "third": "",

    "forth": [

        {

          "values": "",

          "entity": "",

          "TIMEX3": [

            {

              "expression": "",

              "tid": "",

              "type": "",

              "value": "",

              "mod": "",

              "anchorTimeID": "",

              "beginPoint": "",

              "endPoint": ""

                    }

                  ]

                }

              ]

            },...

我试过以下,但输出太乱,看起来不像我想保存的输出


  my_json = (df.groupby(['text','intent','domain'], as_index=False)

               .apply(lambda x: x[['entities']].to_dict('r'))

               .reset_index()

               .to_json(orient='records',indent= 2))


鸿蒙传说
浏览 135回答 1
1回答

慕妹3242003

我相信,您离想要的格式不远了。唯一的问题是列forth包含字典作为字符串。一种可能的方法是将所有内容转换回字典,使用 eval 将字符串转换回字典,并使用 json 解析器很好地打印它:import pandas as pdimport jsondata = {'First':  ['First value', 'Second value'],    'Second': ['First value', 'Second value'],    'third': ['First value', 'Second value'],    'forth': ['[{"values": "","entity": "datetime","Turn":  [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]','[{"values": "","entity": "datetime","Turn": [{"expression": "","tid": "","type": "", "value": "","mod": "","anchor": "","beginPoint": "","endPoint": ""}]}]'],    }df = pd.DataFrame (data, columns = ['First','Second','third','forth'])my_dict = df.to_dict(orient='records')for row in my_dict:    row['forth'] = eval(row['forth'])my_json = json.dumps(my_dict, indent=2)print(my_json)有两个小的更正,密钥大写Second和无效条目:, "", 在您的forth密钥中。这是我的输出的副本:[  {    "First": "First value",    "Second": "First value",    "third": "First value",    "forth": [      {        "values": "",        "entity": "datetime",        "Turn": [          {            "expression": "",            "tid": "",            "type": "",            "value": "",            "mod": "",            "anchor": "",            "beginPoint": "",            "endPoint": ""          }        ]      }    ]  },  ...如果列forth已经是数据框中的字典,您可以to_json直接调用,格式将是您想要的。例如,您可以尝试将更正后的数据转换回my_dict数据帧:test_df = pd.DataFrame(my_dict)print(test_df.to_json(orient='records', indent=2))
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python