抓取表中的行需要与先前的元素关联

我想从这个网站上抓取表格: https ://www.oddsportal.com/moving-margins/

我需要表内的数据#moving_margins_content_overall

我尝试了这段代码,但有些游戏包含许多 class="odd" 并且我不知道如何将 class="odd" 数据与 class="dark" 数据关联

import requests

from bs4 import BeautifulSoup

import time

import json

import csv

from selenium import webdriver


u = 'https://www.oddsportal.com/moving-margins/'


driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")

driver.get(u)

driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")    

driver.implicitly_wait(60) # seconds

time.sleep(2)

elem = driver.find_element_by_xpath("//*")

source_code = elem.get_attribute("innerHTML")

soup = BeautifulSoup(source_code, 'html.parser')


for k in soup.select('#moving_margins_content_overall .table-main tbody tr'):

    sport = k.select_one('tr.dark th > a').get_text(strip=True) #sport

    country = soup.select_one('tr.dark th a:nth-child(3) span').get_text(strip=True) #country

    competition = soup.select_one('tr.dark th a:nth-child(5)').get_text(strip=True) #sport


当年话下
浏览 122回答 1
1回答

PIPIONE

您可以使用下面的代码将所有数据存储在一个列表中,其中页面中的每一行都存储为列表。u = 'https://www.oddsportal.com/moving-margins/'driver = webdriver.Chrome(executable_path=r"C:\chromedriver.exe")driver.maximize_window()driver.get(u)#Use Explicit time wait for fast executionWebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#moving_margins_content_overall")))driver.execute_script("window.scrollTo(0, document.body.scrollHeight)")table_data =  driver.find_elements_by_xpath("//div[@id='moving_margins_content_overall']//tr[@class='odd' or @class='dark']")table =[]# Creating a list of lists, where each list consist all data in each row either with class dark or oddfor data in table_data:    row = []    dark_row = data.find_elements_by_xpath((".//th//a"))    for col in dark_row:        row.append(col.text.replace("\n"," "))    row.append(data.find_element_by_xpath(".//following-sibling::tr//th[@class='first2']").text)# Add data in first2 th    odd_row = data.find_elements_by_xpath((".//following-sibling::tr[@class='odd']//td"))    for col in odd_row:        row.append(col.text.replace("\n", " "))    row.append(odd_row[-1].find_element_by_xpath('.//a').get_attribute("title")) #Add bookmaker name    table.append(row)for t in table:    print(t)输出 正如您所看到的橄榄球联盟比赛有两种赔率,因此该比赛的列表很长。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python