将网页抓取数据转换为数据框

我抓取了数据并尝试转换为 json 格式。但是,它似乎不成功,我想用键和值转换字典,然后转换为数据帧。


from bs4 import BeautifulSoup

import bs4

import requests

import json


req = Request(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6)})

webpage = urlopen(req).read().decode("utf-8")

webpage = json.loads(webpage)

输出:


{'data': [{'id': 'GILD',

   'attributes': {'longDesc': "Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.",

    'sectorname': 'Health Care',

    'sectorgics': 35,

    'primaryname': 'Biotechnology',

    'primarygics': 35201010,

    'numberOfEmployees': 11800.0,

    'yearfounded': 1987,

    'streetaddress': '333 Lakeside Drive',

    'streetaddress2': None,

    'streetaddress3': None,

    'streetaddress4': None,

    'city': 'Foster City',

    'peRatioFwd': 9.02045209903122,

    'lastClosePriceEarningsRatio': None,

    'divRate': 2.72,

    'divYield': 4.33,

    'shortIntPctFloat': 1.433,

    'impliedMarketCap': None,

    'marketCap': 78796576654.0,

    'divTimeFrame': 'forward'}}]}

我想要的结果是:


df = {'id':'GILD', 'longDesc', 'Gildead...}


紫衣仙女
浏览 75回答 1
1回答

当年话下

根据您的数据,您可以尝试此操作,作为代码的一部分:d = {    'data': [        {            'id': 'GILD',            'attributes': {                'longDesc': "Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.",                'sectorname': 'Health Care',                'sectorgics': 35,                'primaryname': 'Biotechnology',                'primarygics': 35201010,                'numberOfEmployees': 11800.0,                'yearfounded': 1987,                'streetaddress': '333 Lakeside Drive',                'streetaddress2': None,                'streetaddress3': None,                'streetaddress4': None,                'city': 'Foster City',                'peRatioFwd': 9.02045209903122,                'lastClosePriceEarningsRatio': None,                'divRate': 2.72,                'divYield': 4.33,                'shortIntPctFloat': 1.433,                'impliedMarketCap': None,                'marketCap': 78796576654.0,                'divTimeFrame': 'forward'}        }    ]}try:    _id = d['data'][0]['id']    ld = d['data'][0]['attributes']['longDesc']    df = {"id": _id, 'longDesc': ld}except (KeyError, ValueError) as error:    print(f"Failed to load data: {error}")print(df)输出:{'id': 'GILD', 'longDesc': 'Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.'}注意: df通常被称为dataframe,大多是用pandas模块创建的。但是,您拥有的可能是JSON从您发出的请求返回的对象。话虽如此,您想要的输出实际上是 a dictionary,但我保留了您的命名约定。编辑:要将您的转换dict为df只需执行以下操作:import pandas as pdd = {'id': 'GILD', 'longDesc': 'Gilead Sciences, Inc., a research-based biopharmaceutical company, discovers, develops, and commercializes medicines in the areas of unmet medical needs in the United States, Europe, and internationally. It was founded in 1987 and is headquartered in Foster City, California.'}df = pd.Dataframe(d.items())print(df)这输出:          0                                                  10        id                                               GILD1  longDesc  Gilead Sciences, Inc., a research-based biopha...
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python