BeautifulSoup 在网页上找不到表格

所以我发现我的问题,


它位于代码的前面,我最初从另一个数据帧中切出了change_details。


change_details = gdp_sched_today[[start_date', 'end_date']]

change_details.columns = ['Planned Start Date', 'Planned End Date']

change_details['Planned Start Date'] = change_details['Planned Start Date'].dt.strftime('%d/%m/%Y %h:%M')

change_details['Planned End Date'] = change_details['Planned End Date'].dt.strftime('%d/%m/%Y %H:%M')

我可以通过在第一行添加 .copy() 来解决这个问题,确保 Pandas 知道我打算将其设为副本而不是视图。


change_details = gdp_sched_today[[start_date', 'end_date']].copy()

change_details.columns = ['Planned Start Date', 'Planned End Date']

change_details['Planned Start Date'] = change_details['Planned Start Date'].dt.strftime('%d/%m/%Y %h:%M')

change_details['Planned End Date'] = change_details['Planned End Date'].dt.strftime('%d/%m/%Y %H:%M')

如果警告能更清楚地说明触发它的原因,那就太好了:)


慕村225694
浏览 168回答 2
2回答

隔江千里

表存在于iframe您需要iframe先切换才能访问的内部table。引发WebDriverWait()等待frame_to_be_available_and_switch_to_it()和下面的定位符。引发WebDriverWait()等待visibility_of_element_located()和下面的定位符。driver.get("https://learn.microsoft.com/en-us/windows/release-information/")WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))table=WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table.cells-centered")))您需要导入以下库。from selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support import expected_conditions as EC或者您将下面的代码与xpath.driver.get("https://learn.microsoft.com/en-us/windows/release-information/")WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]')))您可以将表数据进一步导入到 pandas 数据框,然后导出到 csv 文件。您需要导入 pandas。driver.get("https://learn.microsoft.com/en-us/windows/release-information/")WebDriverWait(driver,10).until(EC.frame_to_be_available_and_switch_to_it((By.ID,"winrelinfo_iframe")))table=WebDriverWait(driver,10).until(EC.presence_of_element_located((By.XPATH,'//*[@id="winrelinfo_container"]/table[1]'))).get_attribute('outerHTML')df=pd.read_html(str(table))[0]print(df)df.to_csv("path/to/csv")导入熊猫:pip install pandas然后添加以下库import pandas as pd

撒科打诨

该表位于 内部<iframe>,因此BeautifulSoup在原始页面中看不到它:import requests&nbsp;from bs4 import BeautifulSoupurl = 'https://learn.microsoft.com/en-us/windows/release-information/'soup = BeautifulSoup(requests.get(url).content, 'html.parser')soup = BeautifulSoup(requests.get(soup.select_one('iframe')['src']).content, 'html.parser')for row in soup.select('table tr'):&nbsp; &nbsp; print(row.get_text(strip=True, separator='\t'))印刷:Version Servicing option&nbsp; &nbsp; Availability date&nbsp; &nbsp;OS build&nbsp; &nbsp; Latest revision date&nbsp; &nbsp; End of service: Home, Pro, Pro Education, Pro for Workstations and IoT Core End of service: Enterprise, Education and IoT Enterprise2004&nbsp; &nbsp; Semi-Annual Channel 2020-05-27&nbsp; 19041.546&nbsp; &nbsp;2020-10-01&nbsp; 2021-12-14&nbsp; 2021-12-14&nbsp; Microsoft recommends1909&nbsp; &nbsp; Semi-Annual Channel 2019-11-12&nbsp; 18363.1110&nbsp; 2020-09-16&nbsp; 2021-05-11&nbsp; 2022-05-101903&nbsp; &nbsp; Semi-Annual Channel 2019-05-21&nbsp; 18362.1110&nbsp; 2020-09-16&nbsp; 2020-12-08&nbsp; 2020-12-081809&nbsp; &nbsp; Semi-Annual Channel 2019-03-28&nbsp; 17763.1490&nbsp; 2020-09-16&nbsp; 2020-11-10&nbsp; 2021-05-111809&nbsp; &nbsp; Semi-Annual Channel (Targeted)&nbsp; 2018-11-13&nbsp; 17763.1490&nbsp; 2020-09-16&nbsp; 2020-11-10&nbsp; 2021-05-111803&nbsp; &nbsp; Semi-Annual Channel 2018-07-10&nbsp; 17134.1726&nbsp; 2020-09-08&nbsp; End of service&nbsp; 2021-05-11...and so on.
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python