手记

python3+selenium+chrome driver+代理IP,验证弹窗如何关闭

python爬虫,基于selenium+chrome driver使用代理IP进行数据采集,如何关闭验证弹窗。运行环境如下:
1、程序语言python3
2、确保selenium安装完成
3、chrome浏览器不要求必须更新到最新版本,只要保证本地chrome浏览器版本和将要下载的驱动文件版本适配即可,注意chrome版本号需要关注前三段,例如:100.0.4896
4、爬虫代理或代理服务器地址以上环境准备好之后,程序如下

from selenium import webdriver
username = 'username'
password = 'password'
url = 'http://whatismyipaddress.com'
PROXY = "www.16yun.cn:8000"  # IP:PORT or HOST:PORT
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % PROXY)
chrome = webdriver.Chrome(options=chrome_options)
chrome.get(url)

运行以上程序,每次都会出现弹窗,要求输入用户名和密码的情况,这也是selenium框架下使用代理IP经常出现的问题,解决方法如下:

from selenium import webdriver
import os
import zipfile
url = 'https://whatismyipaddress.com/'# 目标网站
PROXY = 'www.16yun.cn' # 代理服务器地址
port = '31111' # 代理服务器端口
user = 'username' # 代理服务器用户名
passw = 'password' # 代理服务器密码
manifest_json = """
{
    "version": "1.0.0",
    "manifest_version": 2,
    "name": "Chrome Proxy",
    "permissions": [
        "proxy",
        "tabs",
        "unlimitedStorage",
        "storage",
        "<all_urls>",
        "webRequest",
        "webRequestBlocking"
    ],
    "background": {
        "scripts": ["background.js"]
    },
    "minimum_chrome_version":"22.0.0"
}
"""
background_js = """
var config = {
        mode: "fixed_servers",
        rules: {
        singleProxy: {
            scheme: "http",
            host: "%s",
            port: parseInt(%s)
        },
        bypassList: ["localhost"]
        }
    };

chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});

function callbackFn(details) {
    return {
        authCredentials: {
            username: "%s",
            password: "%s"
        }
    };
}

chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
);
""" % (PROXY, port, user, passw)
def get_chromedriver(use_proxy=False, user_agent=None):
    path = os.path.dirname(os.path.abspath(__file__))
    chrome_options = webdriver.ChromeOptions()
    if use_proxy:
        pluginfile = 'proxy_auth_plugin.zip'

        with zipfile.ZipFile(pluginfile, 'w') as zp:
            zp.writestr("manifest.json", manifest_json)
            zp.writestr("background.js", background_js)
        chrome_options.add_extension(pluginfile)
    if user_agent:
        chrome_options.add_argument('--user-agent=%s' % user_agent)
    driver = webdriver.Chrome(
        os.path.join(path, 'chromedriver'),
        chrome_options=chrome_options)
    return driver
driver = get_chromedriver(use_proxy=True)
driver.get(url)

以上的程序需要与chromedriver.exe 在相同目录中(否则会没有临时文件写入、读取权限),如果复制该代码使用,请注意代码格式和代理认证信息正确。

0人推荐
随时随地看视频
慕课网APP