从网页上的锚标记访问详细信息

我正在抓取一个网页,我已经设法使用 selenium 将表中的数据提取到一个 csv 文件中。我正在努力的是从表格每一行上的锚标签中获取信息。


我尝试单击表格的所有锚标记以从相应的 URL 获取信息,但在单击第一个 URL 后它停止了。它给出了一个错误消息:过时的元素引用:元素未附加到页面文档。我不确定这是解决这个问题的正确方法。这是我迄今为止尝试过的代码。如果代码格式不正确,我很抱歉,我是 python 和 stackoverflow 的新手。


 import csv

 import requests

 import time

 from selenium import webdriver

 from selenium.webdriver.common.by import By

 from selenium.webdriver.support.ui import WebDriverWait

 from selenium.webdriver.support import expected_conditions as EC


 browser = webdriver.Chrome(executable_path=r"D:\jewel\chromedriver.exe")

 browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml'))

 signInButton = browser.find_element_by_css_selector(".only")

 signInButton.click()

 time.sleep(5)

 table = browser.find_element_by_css_selector(".list-table")


 for a in browser.find_elements_by_css_selector(".detailLink"):

  a.click()

  time.sleep(2)

  browser.execute_script("window.history.go(-1)")

  time.sleep(2)


 with open('output.csv', "w") as f:

   writer = csv.writer(f)

   writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])

  for row in table.find_elements_by_css_selector('tr'):

    writer.writerow([d.text for d in row.find_elements_by_css_selector('td')])



 browser.close()

我需要的是从具有类 detailLink 的标签的 href 中获取数据。我无法找到适当的方法来执行此操作。


人到中年有点甜
浏览 141回答 2
2回答

猛跑小猪

我使用普通的 for 循环来迭代表而不是 for each 循环。试试这个,让我知道它是怎么回事。import csvimport timefrom selenium import webdriverbrowser = webdriver.Chrome('/usr/local/bin/chromedriver')  # Optional argument, if not specified will search path.browser.implicitly_wait(5)browser.execute_script("window.open('about:blank','tab1');")browser.switch_to.window("tab1")browser.get(('https://e-sourcingni.bravosolution.co.uk/web/login.shtml'))signInButton = browser.find_element_by_css_selector(".only")signInButton.click()time.sleep(5)table = browser.find_element_by_css_selector(".list-table")links=browser.find_elements_by_css_selector(".detailLink")for i in range(len(links)):    links=browser.find_elements_by_css_selector(".detailLink")     links[i].click()    time.sleep(2)    browser.execute_script("window.history.go(-1)")    time.sleep(2)with open('output.csv', "w") as f:    writer = csv.writer(f)    writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])    table=browser.find_elements_by_xpath("//table[@class='list-table']//tr")    for row in range(len(table)):        x=[]        for d in browser.find_elements_by_xpath("//table[@class='list-table']//tr["+str(row)+"]//td"):            x.append(d.text.encode('utf-8'))        writer.writerow(x)browser.close()

海绵宝宝撒

是的,因为您移动到下一页,因为您更改了页面,它无法在上一页上找到该元素。你可以试试这个import csvimport requestsimport timefrom selenium import webdriverfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as EC browser = webdriver.Chrome(executable_path=r"D:\jewel\chromedriver.exe")browser.execute_script("window.open('about:blank','tab1');")browser.switch_to.window("tab1")browser.get("https://e-sourcingni.bravosolution.co.uk/web/login.shtml")signInButton = browser.find_element_by_css_selector(".only")signInButton.click()time.sleep(5)table = browser.find_element_by_css_selector(".list-table")for a in table.find_elements_by_tag_name("a"):    try:        if a.get_attribute("class") == "detailLink":            id = a.get_attribute("onclick")            id = id.replace("javascript:goToDetail('","")            id = id.replace("', '02260');stopEventPropagation(event);", "")            a_href = a.get_attribute("href")            browser.execute_script("window.open('about:blank','tab2');")            browser.switch_to.window("tab2")            browser.get("https://e-sourcingni.bravosolution.co.uk/esop/toolkit/opportunity/opportunityDetail.do?opportunityId="+ id +"&oppList=CURRENT")            time.sleep(2)            #wait for the element to load            browser.switch_to.window("tab1")            # print("in it ")    except:        print("detailLink is not present in the a tag class")with open('output.csv', "w") as f:    writer = csv.writer(f)    writer.writerow(["S.No","Status","Organization","Project Title","First Publishing Date","Work Category","Listing Deadline"])    for row in table.find_elements_by_css_selector('tr'):        writer.writerow([d.text for d in row.find_elements_by_css_selector('td')])browser.close()
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python