如何遍历表中的 HTML 链接以从表中提取数据？

我正在尝试浏览https://bgp.he.net/report/world的表格。我想浏览每个指向国家/地区页面的 HTML 链接，然后获取数据，然后迭代到下一个列表。我正在使用美丽的汤，并且已经可以获取我想要的数据，但无法弄清楚如何遍历 HTML 列。

from bs4 import BeautifulSoup

import requests

import json

headers = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:56.0) Gecko/20100101 Firefox/56.0'}

url = "https://bgp.he.net/country/LC"

html = requests.get(url, headers=headers)

country_ID = (url[-2:])

print("\n")

soup = BeautifulSoup(html.text, 'html.parser')

#print(soup)

data = []

for row in soup.find_all("tr")[1:]: # start from second row

cells = row.find_all('td')

data.append({

'ASN': cells[0].text,

'Country': country_ID,

"Name": cells[1].text,

"Routes V4": cells[3].text,

"Routes V6": cells[5].text

})

i = 0

with open ('table_attempt.txt', 'w') as r:

for item in data:

r.write(str(data[i]))

i += 1

r.write("\n")

print(data)

我希望能够将每个国家的数据收集到一个书面文本文件中。

倚天杖

浏览 169回答 3