我正在尝试创建一个机器人,它将使用 selenium 和 python3 从名为“Sdarot”的网站下载视频。
网站中的每个视频(或剧集)都有一个唯一的页面和 URL。加载剧集时,您必须等待 30 秒才能“加载”该剧集,然后 <video> 标记才会出现在 HTML 源文件中。
问题在于,对视频的请求是以一种或另一种方式加密或保护的(我真的不明白它是如何工作的)!当我尝试简单地等待视频标签出现,然后使用 urllib 库下载视频(参见下面的代码)时,出现以下错误:urllib.error.HTTPError: HTTP Error 401: Unauthorized
我应该注意到,当我尝试打开 selenium 驱动程序中下载视频的链接时,它打开得完全正常,我可以手动下载它。
如何自动下载视频?提前致谢!
代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import urllib.request
def load(driver, url):
driver.get(url) # open the page in the browser
try:
# wait for the episode to "load"
# if something is wrong and the episode doesn't load after 45 seconds,
# the function will call itself again and try to load again.
continue_btn = WebDriverWait(driver, 45).until(
EC.element_to_be_clickable((By.ID, "proceed"))
)
except:
load(url)
def save_video(driver, filename):
video_element = driver.find_element_by_tag_name(
"video") # get the video element
video_url = video_element.get_property('src') # get the video url
# trying to download the video
urllib.request.urlretrieve(video_url, filename)
# ERROR: "urllib.error.HTTPError: HTTP Error 401: Unauthorized"
def main():
URL = r'https://www.sdarot.dev/watch/339-%D7%94%D7%A4%D7%99%D7%92-%D7%9E%D7%95%D7%AA-ha-pijamot/season/1/episode/23'
DRIVER = webdriver.Chrome()
load(DRIVER, URL)
video_url = save_video(DRIVER, "video.mp4")
if __name__ == "__main__":
main()
慕哥6287543
相关分类