Scrapy Python 网页抓取 JSON

我正在努力弄清楚如何使用 Scrapy Python 抓取 JSON 响应。我能够成功地在同一站点的不同页面上抓取 JSON。我将不胜感激任何帮助。


我如何抓取“tournamentGroup”中的值(即id、名称)以及年份、标题等。


部分代码:


start_url = 'https://api.wtatennis.com/tennis/tournaments/?page=0&pageSize=100&excludeLevels=ITF&from=2020-09-01&to=2020-09-30'

    

with urllib.request.urlopen(start_url) as start_url:

    json_obj = start_url.read()

    rank_list = json.loads(json_obj)


    for item in rank_list:

        

        rank_data = []

        tourney_id = item['content']['id']

        tourney_year = item['year']

    

        rank_data = [tourney_id, tourney_year]

 

        cur.execute("""insert into wta_rankings(tourney_id, tourney_year) 

                    values(%s, %s)

                    ON CONFLICT DO NOTHING"""

                    ,(rank_data))

        conn.commit()        

    cur.close()


JSON:

{

   "pageInfo":{

      "page":0,

      "numPages":0,

      "pageSize":100,

      "numEntries":10

   },

   "content":[

      {

         "tournamentGroup":{

            "id":2023,

            "name":"Prague 125K",

            "level":"125K",

            "metadata":null

         },

         "year":2020,

         "title":"Prague Open",

         "startDate":"2020-08-29",

         "endDate":"2020-09-06",

         "surface":"Clay",

         "inOutdoor":"O",

         "city":"PRAGUE",

         "country":"Czech Republic",

         "singlesDrawSize":128,

         "doublesDrawSize":32,

         "prizeMoney":3125000,

         "prizeMoneyCurrency":"USD",

         "liveScoringId":"2023"

      },

URL 示例:https://api.wtatennis.com/tennis/tournaments/?page =0&pageSize=100&excludeLevels=ITF&from=2020-09-01&to=2020-09-30


缥缈止盈
浏览 116回答 1
1回答

摇曳的蔷薇

尝试这个:import requestsurl = "https://api.wtatennis.com/tennis/tournaments/?page=0&pageSize=100&excludeLevels=ITF&from=2020-09-01&to=2020-09-30"response = requests.get(url).json()for item in response["content"]:    print(f"{item['tournamentGroup']['name']} - {item['year']} - {item['title']}")这为您提供了(这只是一个示例,您可以获得任何您想要的字段):Prague 125K - 2020 - Prague OpenUS OPEN - 2020 - US Open - New York, United States, NYWARSAW - 2020 - BNP Paribas Warsaw Open - Warsaw, PolandISTANBUL - 2020 - TEB BNP Paribas Tennis Championship Istanbul - Istanbul, TurkeyMADRID - 2020 - Mutua Madrid Open - Madrid, SpainHIROSHIMA - 2020 - Hana-cupid Japan Women's Open - Hiroshima, JapanROME - 2020 - Internazionali BNL d'Italia - Rome, ItalySTRASBOURG - 2020 - Internationaux de Strasbourg - Strasbourg, FranceROLAND GARROS - 2020 - Roland Garros - Paris, FranceTASHKENT - 2020 - Tashkent Open - Tashkent, Uzbekistan如果您在 JSON 中“导航”遇到困难,只需将响应内容复制到在线JSON 格式化程序中,单击wrench图标即可修复它,然后单击Format / Beautify。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python