绝地无双
当您分析网站网络调用时,它会发出 ajax 请求以获取要下载的数据的所有链接。import requestsres = requests.get("https://ped.uspto.gov/api/")data = res.json()print(data)输出:{'message': None, 'helpText': '{}', 'xmlDownloadMetadata': [{'lastUpdated': 'Sat 15 Aug 2020 01:30:57-0400', 'sizeInBytes': 10429068701, 'fileName': 'pairbulk-delta-20200815-xml', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 10:20:10-0400', 'sizeInBytes': 100685778, 'fileName': '1900-1919-pairbulk-full-20200809-xml', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 10:20:14-0400', 'sizeInBytes': 13877, 'fileName': '1920-1939-pairbulk-full-20200809-xml', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 10:20:15-0400', 'sizeInBytes': 93016, 'fileName': '1940-1959-pairbulk-full-20200809-xml', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 10:20:15-0400', 'sizeInBytes': 82353484, 'fileName': '1960-1979-pairbulk-full-20200809-xml', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 10:20:16-0400', 'sizeInBytes': 5019098918, 'fileName': '1980-1999-pairbulk-full-20200809-xml', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 10:20:46-0400', 'sizeInBytes': 33231977060, 'fileName': '2000-2019-pairbulk-full-20200809-xml', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 10:23:23-0400', 'sizeInBytes': 24313575, 'fileName': '2020-2020-pairbulk-full-20200809-xml', 'updatedFile': False}], 'jsonDownloadMetadata': [{'lastUpdated': 'Sat 15 Aug 2020 03:08:00-0400', 'sizeInBytes': 5957650088, 'fileName': 'pairbulk-delta-20200815-json', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 15:18:23-0400', 'sizeInBytes': 66467976, 'fileName': '1900-1919-pairbulk-full-20200809-json', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 15:18:25-0400', 'sizeInBytes': 10100, 'fileName': '1920-1939-pairbulk-full-20200809-json', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 15:18:27-0400', 'sizeInBytes': 69891, 'fileName': '1940-1959-pairbulk-full-20200809-json', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 15:18:29-0400', 'sizeInBytes': 54076774, 'fileName': '1960-1979-pairbulk-full-20200809-json', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 15:18:31-0400', 'sizeInBytes': 3009216952, 'fileName': '1980-1999-pairbulk-full-20200809-json', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 15:18:46-0400', 'sizeInBytes': 18853619536, 'fileName': '2000-2019-pairbulk-full-20200809-json', 'updatedFile': False}, {'lastUpdated': 'Sun 09 Aug 2020 15:20:30-0400', 'sizeInBytes': 17518389, 'fileName': '2020-2020-pairbulk-full-20200809-json', 'updatedFile': False}], 'links': [{'rel': 'swagger-api-docs', 'href': '/api-docs'}]}解析 json 并使用这些链接,您可以轻松下载您要查找的文件。但我会说这些文件非常大,最好在请求中使用流式下载。您要查找的链接是中的第一个元素data["jsonDownloadMetadata"]为了获得可下载的链接,解析 jsondata = res.json()for links in data["jsonDownloadMetadata"]: print(f"https://ped.uspto.gov/api/full-download?fileName={links['fileName']}")输出:https://ped.uspto.gov/api/full-download?fileName=pairbulk-delta-20200815-jsonhttps://ped.uspto.gov/api/full-download?fileName=1900-1919-pairbulk-full-20200809-jsonhttps://ped.uspto.gov/api/full-download?fileName=1920-1939-pairbulk-full-20200809-jsonhttps://ped.uspto.gov/api/full-download?fileName=1940-1959-pairbulk-full-20200809-jsonhttps://ped.uspto.gov/api/full-download?fileName=1960-1979-pairbulk-full-20200809-jsonhttps://ped.uspto.gov/api/full-download?fileName=1980-1999-pairbulk-full-20200809-jsonhttps://ped.uspto.gov/api/full-download?fileName=2000-2019-pairbulk-full-20200809-jsonhttps://ped.uspto.gov/api/full-download?fileName=2020-2020-pairbulk-full-20200809-json