抓取请求。

我的代码有什么问题,我尝试获取与https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03-2019_10:00/all/EIP-IC--EIC-相同的内容EIP-IC-KM-REG但结果与我想要的不同。


import requests

from bs4 import BeautifulSoup


s = requests.Session()

s.headers.update({"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 

AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.186 Safari/537.36'})

response=s.get('https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03- 

2019_10:00/all/EIP-IC--EIC-EIP-IC-KM-REG')


soup=BeautifulSoup(response.text,'lxml')

print(soup.prettify())


largeQ
浏览 166回答 2
2回答

宝慕林4294392

您可以使用请求并传入参数来获取火车信息和价格的 json。我没有解析出所有信息,因为这只是为了向您展示这是可能的。我解析出火车 ID 以便能够从价格信息发出后续请求,这些请求通过 ID 链接到火车信息import requestsfrom bs4 import BeautifulSoup as bsurl = 'https://koleo.pl/pl/connections/?'headers = {    'Accept' : 'application/json, text/javascript, */*; q=0.01',    'Accept-Encoding' : 'gzip, deflate, br',    'Accept-Language' : 'en-US,en;q=0.9',    'Connection' : 'keep-alive',    'Cookie' : '_ga=GA1.2.2048035736.1553000429; _gid=GA1.2.600745193.1553000429; _gat=1; _koleo_session=bkN4dWRrZGx0UnkyZ3hjMWpFNGhiS1I3TzhQMGNyWitvZlZ0QVRUVVVtWUFPMUwxL0hJYWJyYnlGTUdHYXNuL1N6QlhHMHlRZFM3eFZFcjRuK3ZubllmMjdSaU5CMWRBSTFOc1JRc2lDUGV0Y2NtTjRzbzZEd0laZWI1bjJoK1UrYnc5NWNzZzNJdXVtUlpnVE15QnRnPT0tLTc1YzV1Q2xoRHF4VFpWWTdWZDJXUnc9PQ%3D%3D--3b5fe9bb7b0ce5960bc5bd6a00bf405df87f8bd4',    'Host' : 'koleo.pl',    'Referer' : 'https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03-2019_10:00/all/EIP-IC--EIC-EIP-IC-KM-REG',    'User-Agent' : 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36',    'X-CSRF-Token' : 'heag3Y5/fh0hyOfgdmSGJBmdJR3Perle2vJI0VjB81KClATLsJxFAO4SO9bY6Ag8h6IkpFieW1mtZbD4mga7ZQ==',    'X-Requested-With' : 'XMLHttpRequest'}params = {    'v' : 'a0dec240d8d016fbfca9b552898aba9c38fc19d5',    'query[date]' : '19-03-2019 10:00:00',    'query[start_station]' : 'krakow-glowny',    'query[end_station]': 'radom',    'query[brand_ids][]' : '29',    'query[brand_ids][]' : '28',    'query[only_direct]' : 'false',    'query[only_purchasable]': 'false'}with requests.Session() as s:    data= s.get(url, params = params, headers = headers).json()    print(data)    priceUrl = 'https://koleo.pl/pl/prices/{}?v=a0dec240d8d016fbfca9b552898aba9c38fc19d5'    for item in data['connections']:        r = s.get(priceUrl.format(item['id'])).json()        print(r)

喵喔喔

您必须使用selenium才能获得动态生成的内容。然后你可以用BS解析html。例如,我解析了日期:from bs4 import BeautifulSoupfrom selenium import webdriverdriver = webdriver.Firefox()driver.get('https://koleo.pl/rozklad-pkp/krakow-glowny/radom/19-03-2019_10:00/all/EIP-IC--EIC-EIP-IC-KM-REG')soup = BeautifulSoup(driver.page_source, 'lxml')for div in soup.findAll("div", {"class": 'date custom-panel'}):    date = div.findAll("div", {"class": 'row'})[0].string.strip()    print(date)输出:wtorek, 19 marcaśroda, 20 marca
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python