在 Python 中使用 BeautifulSoup、Selenium 解析表

在 Python 中使用 BeautifulSoup、Selenium 解析表

https://rocketreach.co/horizon -blue-cross-blue-shield-of-new-jersey-email-format_b5c604a3f42e0c54 这是我试图从中获取信息的链接。我需要提取表中的格式“first '_' last”“first_initial last”等等。如果不是全部，那么至少是最上面的格式。

这是我到目前为止所拥有的：

def search_on_google(key_word, driver):

driver.get("https://www.google.com/")

searchBoard = driver.find_element_by_name('q')

searchBoard.send_keys(key_word + " Rocketreach.co")

searchBoard.send_keys(Keys.TAB)

searchBoard.send_keys(Keys.ENTER)

driver.find_element_by_tag_name("cite").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

for link in soup.find_all('meta'):

content = link.get('content')

print(content)

编辑：

for i in range(1):

driver.find_element_by_tag_name("cite").click()

soup = BeautifulSoup(driver.page_source, 'html.parser')

WebDriverWait(driver, 10).until(EC.presence_of_element_located(

(By.XPATH, "//table/tbody/tr[1]/td[1][not(contains(text(), 'Lorem ipsum...'))]")))

table_id = driver.find_element(By.TAG_NAME, "tbody")

rows = table_id.find_elements(By.TAG_NAME, "tr")

for row in rows:

tds = row.find_elements(By.TAG_NAME, "td")

top_format.append(tds[0].text)

domain.append(tds[1].text)

print(top_format)

print(domain)

break

return top_format

holdtom

浏览 213回答 1

1回答

不负相思意

此页面上只有一张表格可以打印所有信息，您只需执行以下操作即可打印所有信息。它也不在任何 iframe 中。WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[1]/td[1][not(contains(text(), 'Lorem ipsum...'))]")))table_id = driver.find_element(By.TAG_NAME, "tbody")rows = table_id.find_elements(By.TAG_NAME, "tr")for row in rows:    tds = row.find_elements(By.TAG_NAME, "td")    for td in tds:       one_urls.append(td.text)print(one_urls)您可以在打印之前进行检查，也可以进行范围检查。if tds[0] =='':我还建议在查找表格之前稍等一下，因为您在获取表格之前单击并加载了新页面。table_id= WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.TAG_NAME, "tbody")))导入这些from selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC

0

0

随时随地看视频慕课网APP

相关分类

Python