我想使用 Python 中的 MechanicalSoup 包在此ONS 网页上下载 Excel 文件。我已阅读 MechanicalSoup文档。我在 StackOverflow 和其他地方广泛搜索了一个例子,但没有运气。
我的尝试是:
# Install dependencies
# pip install requests
# pip install BeautifulSoup4
# pip install MechanicalSoup
# Import libraries
import mechanicalsoup
import urllib.request
import requests
from bs4 import BeautifulSoup
# Create a browser object that can collect cookies
browser = mechanicalsoup.StatefulBrowser()
browser.open("https://www.ons.gov.uk/economy/grossdomesticproductgdp/timeseries/l2kq/qna")
browser.download_link("https://www.ons.gov.uk/generator?format=xls&uri=/economy/grossdomesticproductgdp/timeseries/l2kq/qna")
在最后一行中,我也尝试过:
browser.download_link(link="https://www.ons.gov.uk/generator?format=xls&uri=/economy/grossdomesticproductgdp/timeseries/l2kq/qna",file="c:/test/filename.xls")
2019 年 1 月 25 日更新:感谢 AKX 在下面的评论,我已经尝试过
browser.download_link(re.escape("https://www.ons.gov.uk/generator?format=xls&uri=/economy/grossdomesticproductgdp/timeseries/l2kq/qna"))
在每种情况下,我都会收到错误消息:
mechanicalsoup.utils.LinkNotFoundError
然而链接确实存在。尝试将其粘贴到您的地址栏中以确认:
https://www.ons.gov.uk/generator?format=xls&uri=/economy/grossdomesticproductgdp/timeseries/l2kq/qna
我究竟做错了什么?
2019 年 1 月 25 日更新 2:感谢下面 AKX 的回答,这是回答我的问题的完整 MWE(为以后遇到相同困难的任何人发帖):
# Install dependencies
# pip install requests
# pip install BeautifulSoup4
# pip install MechanicalSoup
# Import libraries
import mechanicalsoup
import urllib.request
import requests
from bs4 import BeautifulSoup
import re
# Create a browser object that can collect cookies
browser = mechanicalsoup.StatefulBrowser()
browser.open("https://www.ons.gov.uk/economy/grossdomesticproductgdp/timeseries/l2kq/qna")
browser.download_link(link_text=".xls",file="c:/py/ONS_Data.xls" )
相关分类