通过迭代嵌套字典中的第 n 层值来创建数据帧

我从网上人类患病icd-11分类下载了一个json文件,该数据最多有8层嵌套,例如:

  "name":"br08403",

    "children":[

    {

        "name":"01 Certain infectious or parasitic diseases",

        "children":[

        {

            "name":"Gastroenteritis or colitis of infectious origin",

            "children":[

            {

                "name":"Bacterial intestinal infections",

                "children":[

                {

                    "name":"1A00  Cholera",

                    "children":[

                    {

                        "name":"H00110  Cholera"

                    }

我尝试使用以下代码:


def flatten_json(nested_json):

    """

        Flatten json object with nested keys into a single level.

        Args:

            nested_json: A nested json object.

        Returns:

            The flattened json object if successful, None otherwise.

    """

    out = {}


    def flatten(x, name=''):

        if type(x) is dict:

            for a in x:

                flatten(x[a], name + a + '_')

        elif type(x) is list:

            i = 0

            for a in x:

                flatten(a, name + str(i) + '_')

                i += 1

        else:

            out[name[:-1]] = x


    flatten(nested_json)

    return out

df2 = pd.Series(flatten_json(dictionary)).to_frame()

我得到的输出是:


name    br08403

children_0_name 01 Certain infectious or parasitic diseases

children_0_children_0_name  Gastroenteritis or colitis of infectious origin

children_0_children_0_children_0_name   Bacterial intestinal infections

children_0_children_0_children_0_children_0_name    1A00 Cholera

... ...

children_21_children_17_children_10_name    NF0A Certain early complications of trauma, n...

children_21_children_17_children_11_name    NF0Y Other specified effects of external causes

children_21_children_17_children_12_name    NF0Z Unspecified effects of external causes

children_21_children_18_name    NF2Y Other specified injury, poisoning or cer...

children_21_children_19_name    NF2Z Unspecified injury, poisoning or certain..

但所需的输出是一个具有 8 列的数据框,它可以容纳嵌套名称键的最后深度,例如:

http://img3.sycdn.imooc.com/64b629d70001ee5609360116.jpg

FFIVE
浏览 56回答 1
1回答

肥皂起泡泡

一种简单的pandas迭代方法。res = requests.get("https://www.genome.jp/kegg-bin/download_htext?htext=br08403.keg&format=json&filedir=")js = res.json()df = pd.json_normalize(js)for i in range(20):    df = pd.json_normalize(df.explode("children").to_dict(orient="records"))    if "children" in df.columns: df.drop(columns="children", inplace=True)    df = df.rename(columns={"children.name":f"level{i}","children.children":"children"})    if df[f"level{i}"].isna().all() or "children" not in df.columns: break
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python