猿问

无法使用请求从页面中抓取连接到某些下载按钮的链接

我试图抓取的链接是动态链接,但到目前为止我注意到总有某种方法可以找到它。然而,我就是无法做到这一点。

我尝试过:

import requests

from bs4 import BeautifulSoup


link = "https://finance.yahoo.com/quote/AAPL/history?p=AAPL"


r = requests.get(link)

soup = BeautifulSoup(r.text,"html.parser")

file_link = soup.select_one("a[href='/finance/download/']").get("href")

print(file_link)

通过上述尝试,脚本会抛出异常AttributeError:,因为它无法在该站点中找到链接。


如何使用请求从该页面获取下载链接?


函数式编程
浏览 161回答 1
1回答

料青山看我应如是

下载 CSV 的链接似乎是通过 JavaScript 动态构建的。但你可以使用 Python 构建类似的链接:import requestsfrom datetime import datetimecsv_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={from_}&period2={to_}&interval=1d&events=history'quote = 'AAPL'from_ = datetime(2019,9,27,0,0).strftime('%s')to_ = datetime(2020,9,27,23,59).strftime('%s')print(requests.get(csv_link.format(quote=quote, from_=from_, to_=to_)).text)印刷:Date,Open,High,Low,Close,Adj Close,Volume2019-09-27,220.539993,220.960007,217.279999,218.820007,216.670242,253520002019-09-30,220.899994,224.580002,220.789993,223.970001,221.769623,259774002019-10-01,225.070007,228.220001,224.199997,224.589996,222.383545,348058002019-10-02,223.059998,223.580002,217.929993,218.960007,216.808853,346123002019-10-03,218.429993,220.960007,215.130005,220.820007,218.650574,286065002019-10-04,225.639999,227.490005,223.889999,227.009995,224.779770,346197002019-10-07,226.270004,229.929993,225.839996,227.059998,224.829269,305765002019-10-08,225.820007,228.059998,224.330002,224.399994,222.195404,279550002019-10-09,227.029999,227.789993,225.639999,227.029999,224.799576,186926002019-10-10,227.929993,230.440002,227.300003,230.089996,227.829498,282534002019-10-11,232.949997,237.639999,232.309998,236.210007,233.889374,416989002019-10-14,234.899994,238.130005,234.669998,235.869995,233.552719,241069002019-10-15,236.389999,237.649994,234.880005,235.320007,233.008133,218400002019-10-16,233.369995,235.240005,233.199997,234.369995,232.067444,184758002019-10-17,235.089996,236.149994,233.520004,235.279999,232.968521,168963002019-10-18,234.589996,237.580002,234.289993,236.410004,234.087433,243584002019-10-21,237.520004,240.990005,237.320007,240.509995,238.147125,218118002019-10-22,241.160004,242.199997,239.619995,239.960007,237.602539,205734002019-10-23,242.100006,243.240005,241.220001,243.179993,240.790909,18957200...and so on.编辑:import requestsfrom datetime import datetimecsv_link = 'https://query1.finance.yahoo.com/v7/finance/download/{quote}?period1={from_}&period2={to_}&interval=1d&events=history'quote = 'AAPL'from_ = str(datetime.timestamp(datetime(2019,9,27,0,0))).split('.')[0]to_ = str(datetime.timestamp(datetime(2020,9,27,23,59))).split('.')[0]print(requests.get(csv_link.format(quote=quote, from_=from_, to_=to_)).text)
随时随地看视频慕课网APP

相关分类

Python
我要回答