使用 Selenium 从 .jsp 页面抓取表数据

我正在尝试从 .jsp 页面中抓取表格(详情如下)。表格仅在输入数据后加载(火车号和旅程站)


对于您的试验,列车号可以是56913,旅程站可以是SBC(输入数据后,这将自动更改为“KSR Bengaluru”。


使用下面的脚本,我能够生成表格,但是,我无法提取它(在空列表中打印结果)。我需要得到完整的桌子。任何人都可以帮助知道如何提取表格吗?


我对网络抓取非常陌生。因此,如果犯了一些基本错误,请朝正确的方向轻轻推动我。


import time

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.firefox.options import Options

from selenium.webdriver import Firefox

from selenium.webdriver.common.keys import Keys

from selenium.webdriver.common.action_chains import ActionChains


from bs4 import BeautifulSoup

import soupsieve as sv

import requests

# Activate the following line if you do not want to see the Firefox window.

# Better deactivate it for debugging.

# os.environ['MOZ_HEADLESS'] = '1'


url = 'https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp'


opts = Options()

driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)

driver.get(url)

WebDriverWait(driver, 20)


train_field = driver.find_element_by_id("trnSrchTxt")

train_field.send_keys("56913")

time.sleep(2)

actions = ActionChains(driver)

actions.send_keys('SBC',Keys.ENTER)

actions.perform()


WebDriverWait(driver, 1)

result_table = driver.find_elements_by_id("mapTrnSch")

print(result_table)

更新 除了来自@MadRay 的答案之外,以下代码还获取了数据(不确定它有多健壮)。


import os

import time

from bs4 import BeautifulSoup

from selenium.webdriver.support.ui import WebDriverWait

from selenium import webdriver

from selenium.webdriver.firefox.options import Options

from selenium.webdriver import Firefox

from selenium.webdriver.common.action_chains import ActionChains

from selenium.webdriver.common.keys import Keys

import re


os.environ['MOZ_HEADLESS'] = '1'

opts = Options()

driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)

driver.get('https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp')

WebDriverWait(driver, 20)



慕运维8079593
浏览 305回答 1
1回答

繁花不似锦

您必须按 class_name 搜索结果,而不是 id:results = driver.find_elements_by_class_name("mapTrnSch")所有其他代码都运行良好。重要通知。你会有两个结果。第一个是表头,第二个是表内容。这是我在没有 WebDriverWait 和 ActionChains 的情况下编写的示例:import timefrom selenium import webdriverfrom selenium.webdriver.common.keys import Keysurl = 'https://enquiry.indianrail.gov.in/ntes/trainOnMapBh.jsp'driver = Firefox(firefox_binary=r"C:\Program Files (x86)\Mozilla Firefox\firefox.exe", options=opts)driver.get(url)time.sleep(5)# Send search datadriver.find_element_by_id("trnSrchTxt").send_keys("56913")  # Traintime.sleep(5)driver.find_element_by_id("jrnyStn").send_keys('SBC')  # Journeytime.sleep(5)driver.find_element_by_id("searchTrainInMapBtn").click()  # Submit button (seems like we do not need to click on it, but let's click for sure)time.sleep(5)# Gain resultsresults = driver.find_elements_by_class_name("mapTrnSch")print(results[0].text)  # 1st result for table headersprint(results[1].text)  # 2st result for table content
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python