具有多处理功能的 Selenium 函数

首页课程实战体系课手记专栏慕课教程

具有多处理功能的 Selenium 函数

我已经编写了基于 selenium 的函数，我希望它同时解析多个网页。我有我传递给我想要同时抓取的函数的 url 列表，以节省时间。

我创建了 scraper.py 文件，我在其中放置了 scraper 函数：

def parser_od(url):

price=[]

url_of = url

driver.get(url_of)

try:

price.append(browser.find_element_by_xpath("//*[@id='root']/article/header/div[2]/div[1]/div[2]").text.replace(" ","").replace("zł","").replace(",","."))

except NoSuchElementException:

price.append("")

multiprocessing现在我想使用该函数使用库同时从我的 url 解析多个 url ：

from scraper import *

url_list=['https://www.otodom.pl/oferta/2-duze-pokoje-we-wrzeszczu-do-zamieszania-ID42f6s',

'https://www.otodom.pl/oferta/mieszkanie-na-zamknietym-osiedlu-z-ogrodkiem-ID40ZxM',

'https://www.otodom.pl/oferta/zaciszna-nowe-mieszkanie-3-pokoje-0-ID41UaX',

'https://www.otodom.pl/oferta/dwupoziomowe-dewel-mieszkanie-101-m2-lebork-i-p-ID3JEcQ']

driver = webdriver.Chrome(executable_path=r"C:\Users\Admin\chromedriver.exe")

from multiprocessing import Pool

with Pool(4) as p:

price = p.map(parser_od, url_list)

但我收到以下错误：

NameError: name 'driver' is not defined

这很奇怪，因为 chrome 被打开了。

编辑：我需要在运行此刮板时打开浏览器，以便在每次调用此函数时都打开驱动程序。

ITMISS

浏览 188回答 1

1回答

呼啦一阵风

只是应该将要处理 ino 4 个相等部分的 url 列表拆分，并driver为每个处理Pool.def parser_od(urls, thread_index):driver = webdriver.Chrome(executable_path=r"C:\Users\Admin\chromedriver.exe")    prices = []    for i in range(len(urls)):        url = urls[i]        if i % 4 == thread_index:            price=[]            url_of = url            driver.get(url_of)            try:                price.append(browser.find_element_by_xpath("//*[@id='root']/article/header/div[2]/div[1]/div[2]").text.replace(" ","").replace("zł","").replace(",","."))            except NoSuchElementException:                price.append("")         prices.append(price)    return pricesfrom multiprocessing import Poolwith Pool(4) as p:    price = p.map(lambda x: parser_od(x, url_list), list(range(len(url_list))))

0 0

随时随地看视频慕课网APP

相关分类

Python