将包括锦标赛在内的网球成绩表刮到每一行

我想从此页面抓取比赛结果:https://www.tennisexplorer.com/player/paire-4a33b/


从抓取的结果中,我想创建包含以下列的表:tournament、date、match_player_1、match_player_2、round、score 我创建了一个代码,它有效,但我不知道如何为每个比赛行添加比赛


import requests

from bs4 import BeautifulSoup


u = 'https://www.tennisexplorer.com/player/paire-4a33b/'


r = requests.get(u, timeout=120, headers=headers)

# print(r.status_code)

soup = BeautifulSoup(r.content, 'html.parser')


for tr in soup.select('#matches-2020-1-data tr'):

    match_date = tr.select_one('td:nth-of-type(1)').get_text(strip=True)

    match_surface = tr.select_one('td:nth-of-type(2)').get_text(strip=True)

    match = tr.select_one('td:nth-of-type(3)').get_text(strip=True)

#...

我需要像这样创建表:


tournament                      date    match_player_1  match_player_2  round   score

Cincinnati Masters (New York)   22.08.  Coric B.        Paire B.        1R      6-0, 1-0

Ultimate Tennis Showdown 2      01.08.  Moutet C.       Paire B.        NaN     15-0, 15-0, 15-0, 15-0

我如何将锦标赛与每场比赛联系起来


慕妹3146593
浏览 85回答 2
2回答

30秒到达战场

要获得所需的 DataFrame,您可以这样做:import requestsimport pandas as pdfrom bs4 import BeautifulSoupurl = 'https://www.tennisexplorer.com/player/paire-4a33b/'soup = BeautifulSoup( requests.get(url).content, 'html.parser' )all_data = []for row in soup.select('#matches-2020-1-data tr:not(:has(th))'):    tds = [td.get_text(strip=True, separator=' ') for td in row.select('td')]    all_data.append({        'tournament': row.find_previous('tr', class_='head flags').find('td').get_text(strip=True),        'date': tds[0],        'match_player_1': tds[2].split('-')[0].strip(),        'match_player_2': tds[2].split('-')[-1].strip(),        'round': tds[3],        'score': tds[4]        })df = pd.DataFrame(all_data)df.to_csv('data.csv')保存data.csv(来自 LibreOffice 的屏幕截图):

一只甜甜圈

尝试一下:import pandas as pdurl = "https://www.tennisexplorer.com/player/paire-4a33b/"df = pd.read_html(url)[8]new_data = {"tournament":[], "date":[], "match_player_1":[], "match_player_2":[],                                 "round":[], "score":[]}for index, row in df.iterrows():    try:        date = float(row.iloc[0][:-1])        new_data["tournament"].append(tourn)        new_data["date"].append(row.iloc[0])        new_data["match_player_1"].append(row.iloc[2].split("-")[0])        new_data["match_player_2"].append(row.iloc[2].split("-")[1])        new_data["round"].append(row.iloc[3])        new_data["score"].append(row.iloc[4])            except Exception as e:        tourn = row.iloc[0]data = pd.DataFrame(new_data)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python