通过 Python 使用 Selenium 进行多处理时,Chrome 在几个小时后崩溃

这是几个小时抓取后的错误回溯:


The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.

这是我的 selenium python 设置:


#scrape.py

from selenium.common.exceptions import *

from selenium.webdriver.common.by import By

from selenium.webdriver.support import expected_conditions as EC

from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.chrome.options import Options


def run_scrape(link):

    chrome_options = Options()

    chrome_options.add_argument('--no-sandbox')

    chrome_options.add_argument("--headless")

    chrome_options.add_argument('--disable-dev-shm-usage')

    chrome_options.add_argument("--lang=en")

    chrome_options.add_argument("--start-maximized")

    chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])

    chrome_options.add_experimental_option('useAutomationExtension', False)

    chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36")

    chrome_options.binary_location = "/usr/bin/google-chrome"

    browser = webdriver.Chrome(executable_path=r'/usr/local/bin/chromedriver', options=chrome_options)

    browser.get(<link passed here>)

    try:

        #scrape process

    except:

        #other stuffs

    browser.quit()

#multiprocess.py

import time,

from multiprocessing import Pool

from scrape import *


if __name__ == '__main__':

    start_time = time.time()

    #links = list of links to be scraped

    pool = Pool(20)

    results = pool.map(run_scrape, links)

    pool.close()

    print("Total Time Processed: "+"--- %s seconds ---" % (time.time() - start_time))

Chrome、ChromeDriver 设置、Selenium 版本


ChromeDriver 79.0.3945.36 (3582db32b33893869b8c1339e8f4d9ed1816f143-refs/branch-heads/3945@{#614})

Google Chrome 79.0.3945.79

Selenium Version: 4.0.0a3

我想知道为什么 chrome 正在关闭但其他进程正在工作?


九州编程
浏览 48回答 2
2回答

慕莱坞森

我拿了你的代码,稍微修改了一下以适应我的测试环境,这里是执行结果:代码块:multiprocess.py:import timefrom multiprocessing import Poolfrom multiprocessingPool.scrape import run_scrapeif __name__ == '__main__':&nbsp; &nbsp; start_time = time.time()&nbsp; &nbsp; links = ["https://selenium.dev/downloads/", "https://selenium.dev/documentation/en/"]&nbsp;&nbsp; &nbsp; pool = Pool(2)&nbsp; &nbsp; results = pool.map(run_scrape, links)&nbsp; &nbsp; pool.close()&nbsp; &nbsp; print("Total Time Processed: "+"--- %s seconds ---" % (time.time() - start_time))&nbsp;scrape.py:from selenium import webdriverfrom selenium.common.exceptions import NoSuchElementException, TimeoutExceptionfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.chrome.options import Optionsdef run_scrape(link):&nbsp; &nbsp; chrome_options = Options()&nbsp; &nbsp; chrome_options.add_argument('--no-sandbox')&nbsp; &nbsp; chrome_options.add_argument("--headless")&nbsp; &nbsp; chrome_options.add_argument('--disable-dev-shm-usage')&nbsp; &nbsp; chrome_options.add_argument("--lang=en")&nbsp; &nbsp; chrome_options.add_argument("--start-maximized")&nbsp; &nbsp; chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])&nbsp; &nbsp; chrome_options.add_experimental_option('useAutomationExtension', False)&nbsp; &nbsp; chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36")&nbsp; &nbsp; chrome_options.binary_location=r'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'&nbsp; &nbsp; browser = webdriver.Chrome(executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe', options=chrome_options)&nbsp; &nbsp; browser.get(link)&nbsp; &nbsp; try:&nbsp; &nbsp; &nbsp; &nbsp; print(browser.title)&nbsp; &nbsp; except (NoSuchElementException, TimeoutException):&nbsp; &nbsp; &nbsp; &nbsp; print("Error")&nbsp; &nbsp; browser.quit()控制台输出:DownloadsThe Selenium Browser Automation Project :: Documentation for SeleniumTotal Time Processed: --- 10.248600006103516 seconds ---结论很明显,您的程序在逻辑上是完美无缺的。这个用例正如您在几个小时的抓取后提到的这个错误表面,我怀疑这是因为WebDriver 不是线程安全的。话虽如此,如果您可以序列化对底层驱动程序实例的访问,则可以在多个线程中共享一个引用。这是不可取的。但是您总是可以为每个线程实例化一个WebDriver实例。理想情况下,线程安全问题不在您的代码中,而在实际的浏览器绑定中。他们都假设一次只会有一个命令(例如,像一个真实的用户)。但另一方面,您始终可以为每个线程实例化一个WebDriver实例,该实例将启动多个浏览选项卡/窗口。到目前为止,您的程序似乎很完美。现在,不同的线程可以在同一个Webdriver上运行,但是测试的结果不会是你所期望的。背后的原因是,当您使用多线程在不同的选项卡/窗口上运行不同的测试时,需要一点线程安全编码,否则您将执行的操作就像click()或send_keys()将转到当前打开的选项卡/窗口一样无论您希望运行的线程如何,都将成为焦点。这实质上意味着所有测试将在具有焦点但不在预期的选项卡/窗口上的同一选项卡/窗口上同时运行。

HUX布斯

现在我使用这个线程模块为每个线程实例化一个 Webdriverimport threadingthreadLocal = threading.local()def get_driver():&nbsp; &nbsp; browser = getattr(threadLocal, 'browser', None)&nbsp; &nbsp; if browser is None:&nbsp; &nbsp; &nbsp; &nbsp; chrome_options = Options()&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_argument('--no-sandbox')&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_argument("--headless")&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_argument('--disable-dev-shm-usage')&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_argument("--lang=en")&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_argument("--start-maximized")&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"])&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_experimental_option('useAutomationExtension', False)&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.add_argument("user-agent=Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36")&nbsp; &nbsp; &nbsp; &nbsp; chrome_options.binary_location = "/usr/bin/google-chrome"&nbsp; &nbsp; &nbsp; &nbsp; browser = webdriver.Chrome(executable_path=r'/usr/local/bin/chromedriver', options=chrome_options)&nbsp; &nbsp; &nbsp; &nbsp; setattr(threadLocal, 'browser', browser)&nbsp; &nbsp; return browser它确实帮助我比一次执行一个驱动程序更快。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python