如何使用 Python 抓取“sorting

如何使用 Python 抓取“sorting_1”类中的内容？

我得到了一个制作 covid 追踪器的项目。我决定通过网站 ( https://www.worldometers.info/coronavirus/ ) 抓取一些元素。我对 python 很陌生，所以决定使用 BeautifulSoup。我能够抓取基本元素，如总案例、活跃案例等。但是，每当我尝试获取国家名称或数字时，它都会返回一个空列表。即使存在类“sorting_1”，它仍然返回一个空列表。有人可以指导我哪里出错了吗？

这是我想要抓住的东西：

这是我当前的代码：

import requests

import bs4

#making a request and a soup

res = requests.get('https://www.worldometers.info/coronavirus/')

soup = bs4.BeautifulSoup(res.text, 'lxml')

#scraping starts here

total_cases = soup.select('.maincounter-number')[0].text

total_deaths = soup.select('.maincounter-number')[1].text

total_recovered = soup.select('.maincounter-number')[2].text

active_cases = soup.select('.number-table-main')[0].text

country_cases = soup.find_all('td', {'class': 'sorting_1'})

慕田峪7331174

浏览 134回答 2

2回答

浮云间

您可以获得sorting_1课程，因为它不存在于页面源代码中。您已找到表中的所有行，然后从所需的列中读取信息。因此，要获取每个国家/地区的总案例，您可以使用以下代码：import requestsimport bs4res = requests.get('https://www.worldometers.info/coronavirus/')soup = bs4.BeautifulSoup(res.text, 'lxml')country_cases = soup.find_all('td', {'class': 'sorting_1'})rows = soup.select('table#main_table_countries_today tr')for row in rows[8:18]:    tds = row.find_all('td')    print(tds[1].text.strip(), '=',  tds[2].text.strip())

0 0

白板的微信

这些类似乎sorting_X是由 javascript 添加的，因此它们不存在于原始 html 中。但是，该表确实存在，因此我建议循环遍历类似于此的表行：table_rows = soup.find("table", id="main_table_countries_today").find_all("tr")for row in table_rows: name = "unknown" # Find country name for td in row.find_all("td"): if td.find("mt_a"): # This kind of link apparently only exists in the "name" column name = td.find("a").text # Do some more scraping警告，我有一段时间没有喝汤了，所以这可能不是 100% 正确。你明白了。

0 0

随时随地看视频慕课网APP