使用 Python 和 BeautifulSoup 从页面获取表信息

我尝试从中获取信息的页面是https://www.pro-football-reference.com/teams/crd/2017_roster.htm< a i=2>.

我正在尝试从“名册”中获取所有信息表,但由于某种原因我无法通过 BeautifulSoup 获取它。我已经尝试过 soup.find("div", {'id': 'div_games_played_team'}) 但它不起作用。当我查看页面的 HTML 时,我可以看到一个非常大的注释和常规 div 中的表格。我怎样才能使用BeautifulSoup从这个表中获取信息?


jeck猫
浏览 106回答 2
2回答

30秒到达战场

你不需要硒。您可以做的(并且您正确识别了它)是提取注释,然后从其中解析表格。import requestsfrom bs4 import BeautifulSoupfrom bs4 import Commentimport pandas as pdurl = 'https://www.pro-football-reference.com/teams/crd/2017_roster.htm'response = requests.get(url)soup = BeautifulSoup(response.text, 'html.parser')comments = soup.find_all(string=lambda text: isinstance(text, Comment))tables = []for each in comments:&nbsp; &nbsp; if 'table' in each:&nbsp; &nbsp; &nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; tables.append(pd.read_html(each)[0])&nbsp; &nbsp; &nbsp; &nbsp; except ValueError as e:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; print(e)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue输出:print(tables[0].head().to_string())&nbsp; &nbsp; No.&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Player&nbsp; &nbsp;Age&nbsp; Pos&nbsp; &nbsp;G&nbsp; &nbsp;GS&nbsp; &nbsp; &nbsp;Wt&nbsp; &nbsp; Ht&nbsp; College/Univ&nbsp; BirthDate&nbsp; &nbsp;Yrs&nbsp; &nbsp;AV&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Drafted (tm/rnd/yr)&nbsp; &nbsp; &nbsp; Salary0&nbsp; 54.0&nbsp; Bryson Albright&nbsp; 23.0&nbsp; NaN&nbsp; &nbsp;7&nbsp; 0.0&nbsp; 245.0&nbsp; &nbsp;6-5&nbsp; &nbsp; Miami (OH)&nbsp; 3/15/1994&nbsp; &nbsp; &nbsp;1&nbsp; 0.0&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NaN&nbsp; &nbsp; $246,1771&nbsp; 36.0&nbsp; &nbsp; Budda Baker*+&nbsp; 21.0&nbsp; &nbsp;ss&nbsp; 16&nbsp; 7.0&nbsp; 195.0&nbsp; 5-10&nbsp; &nbsp; Washington&nbsp; 1/10/1996&nbsp; Rook&nbsp; 9.0&nbsp; &nbsp; &nbsp;Arizona Cardinals / 2nd / 36th pick / 2017&nbsp; &nbsp; $465,0002&nbsp; 64.0&nbsp; &nbsp; Khalif Barnes&nbsp; 35.0&nbsp; NaN&nbsp; &nbsp;3&nbsp; 0.0&nbsp; 320.0&nbsp; &nbsp;6-6&nbsp; &nbsp; Washington&nbsp; 4/21/1982&nbsp; &nbsp; 12&nbsp; 0.0&nbsp; Jacksonville Jaguars / 2nd / 52nd pick / 2005&nbsp; &nbsp; $176,4713&nbsp; 41.0&nbsp; &nbsp;Antoine Bethea&nbsp; 33.0&nbsp; &nbsp;db&nbsp; 15&nbsp; 6.0&nbsp; 206.0&nbsp; 5-11&nbsp; &nbsp; &nbsp; &nbsp; Howard&nbsp; 7/27/1984&nbsp; &nbsp; 11&nbsp; 4.0&nbsp; &nbsp;Indianapolis Colts / 6th / 207th pick / 2006&nbsp; $2,000,0004&nbsp; 28.0&nbsp; &nbsp; Justin Bethel&nbsp; 27.0&nbsp; rcb&nbsp; 16&nbsp; 6.0&nbsp; 200.0&nbsp; &nbsp;6-0&nbsp; Presbyterian&nbsp; 6/17/1990&nbsp; &nbsp; &nbsp;5&nbsp; 3.0&nbsp; &nbsp; Arizona Cardinals / 6th / 177th pick / 2012&nbsp; $2,000,000....

慕无忌1623718

您尝试抓取的标签是由 JavaScript 动态生成的。您很可能使用请求来抓取 HTML。不幸的是 requests 不会运行 JavaScript,因为它将所有 HTML 作为原始文本提取。 BeautifulSoup 找不到该标签,因为它从未在您的抓取程序中生成。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Html5