需要使用 xpath 和 beautifulsoup 从网站抓取数据

大家好

网站链接

故事是他试图抓取一个名为“Open Bets”的表,但不幸的是该表没有类或 id,我使用 beautifulsoup 来抓取该表,并使用 XPath 来检测该表,但没有发生如图所示的情况以下:

https://img1.sycdn.imooc.com/658250400001cb8812000258.jpg

我尝试从表中抓取数据并检测名为“Team A”与“Team B”的列,重点是我显示了这样的数据


print(Player1," vs ",Player2)

print("Odds ",odds)

print("Rate ",rate)

print("stake ",stake)

我想你会明白我在这里试图做什么,如下表:

https://img1.sycdn.imooc.com/6582504e0001675111800330.jpg

我尝试联系网站管理员向代码源添加类或其他内容,但没有任何结果。


from lxml import html

import requests

page = requests.get('https://tipsters.asianbookie.com/index.cfm?player=Mitya68&ID=297754')

tree = html.fromstring(page.content)

ID = tree.xpath('/html/body/table[2]/tbody/tr/td[3]/table[7]')

print(ID)

这是我使用的代码,如果有人可以帮忙那就太好了=)


宝慕林4294392
浏览 115回答 1
1回答

翻过高山走不出你

一个简单的方法是使用pandas. 操作方法如下:import pandas as pdimport requestsr = requests.get('https://tipsters.asianbookie.com/index.cfm?player=Mitya68&ID=297754&sortleague=1#playersopenbets&tz=5.5').textdfs = pd.read_html(r)df = dfs[141]df.columns = df.iloc[0]df = df.drop(0)df['Bet Placed ≡'] = [value.split('.')[-1] for value in df['Bet Placed ≡']]print(df)输出:0   Bet Placed ≡              Team A  ...   Rate         Pending Status1    9 hours ago         Real Madrid  ...  1.975            pending ?-?2    9 hours ago   Red Bull Salzburg  ...  1.875            pending ?-?3    9 hours ago                Ajax  ...   2.00            pending ?-?4    9 hours ago       Bayern Munich  ...   2.00            pending ?-?5    9 hours ago       Bayern Munich  ...   1.85            pending ?-?6    9 hours ago         Inter Milan  ...  1.875            pending ?-?7    9 hours ago     Manchester City  ...   1.95            pending ?-?8    9 hours ago         Midtjylland  ...  1.875            pending ?-?9    9 hours ago  Olympiakos Piraeus  ...   1.95            pending ?-?10   9 hours ago          Hamburg SV  ...  1.925            pending ?-?11   9 hours ago         Vissel Kobe  ...  1.925   Lost(-25,000) FT 1-312   9 hours ago     Shonan Bellmare  ...  1.825   Won½(+10,313) FT 0-013   9 hours ago    Yokohama Marinos  ...  2.025   Won½(+12,812) FT 2-114   9 hours ago        RKC Waalwijk  ...  1.875            pending ?-?15   9 hours ago            Espanyol  ...  2.075  lose(-25,000) 29' 1-0[15 rows x 7 columns]您还可以通过将这些行添加到代码中来获取这些值作为单独的列表:team_a = list(df['Team A'])team_b = list(df['Team B'])rate = list(df['Rate'])stake = list(df['Stake'])如果您想以您提到的格式打印它们,请将这些行添加到您的代码中:final_lst = zip(team_a,team_b,stake,rate)for teamA,teamB,stakee,ratee in final_lst:    print(f"{teamA} vs {teamB} - Stake: {stakee}, Rate: {ratee}")输出:Real Madrid vs Shaktar Donetsk - Stake: 25000.00, Rate: 1.975Red Bull Salzburg vs Lokomotiv Moscow - Stake: 100000.00, Rate: 1.875Ajax vs Liverpool - Stake: 25000.00, Rate: 2.00Bayern Munich vs Atl. Madrid - Stake: 25000.00, Rate: 2.00Bayern Munich vs Atl. Madrid - Stake: 25000.00, Rate: 1.85Inter Milan vs Monchengladbach - Stake: 25000.00, Rate: 1.875Manchester City vs Porto - Stake: 25000.00, Rate: 1.95Midtjylland vs Atalanta - Stake: 100000.00, Rate: 1.875Olympiakos Piraeus vs Marseille - Stake: 25000.00, Rate: 1.95Hamburg SV vs Erzgebirge Aue - Stake: 100000.00, Rate: 1.925Vissel Kobe vs Kashima Antlers - Stake: 25000.00, Rate: 1.925Shonan Bellmare vs Sagan Tosu - Stake: 25000.00, Rate: 1.825Yokohama Marinos vs Nagoya - Stake: 25000.00, Rate: 2.025RKC Waalwijk vs PEC Zwolle - Stake: 25000.00, Rate: 1.875Espanyol vs Mirandes - Stake: 25000.00, Rate: 2.075
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python