我正在尝试获取代理和端口 我获取了代理但无法获取端口 请帮助我

import requests

from bs4 import BeautifulSoup as bs


headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36"}

url = "https://www.proxyscan.io/"


r=requests.get(url,headers=headers)


soup = bs(r.content,"html.parser")

a = soup.findAll(scope="row")

a = str(a).replace("<th scope=\"row\">", "").replace("</th>", "").replace("[","").replace("]","").replace(" ","")

a = a.split(",")



for proxy in a:

    print(proxy)


慕桂英546537
浏览 139回答 2
2回答

潇潇雨雨

import requestsfrom bs4 import BeautifulSoup as bs&nbsp; &nbsp;&nbsp;headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36"}url = "https://www.proxyscan.io/"&nbsp; &nbsp;&nbsp;r=requests.get(url,headers=headers)&nbsp; &nbsp;&nbsp;soup = bs(r.content,"html.parser")a = soup.findAll(scope="row")a = str(a).replace("<th scope=\"row\">", "").replace("</th>", "").replace("[","").replace("]","").replace(" ","")a = a.split(",")&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;&nbsp;for proxy in a:&nbsp;&nbsp; &nbsp;print(proxy)

守候你守候我

您可以使用find_next_siblings()函数来获取下一个可用标签。因此,通过敏锐地观察解析后的html,我们可以看到端口是代理之后的下一个标签。因此我们可以循环变量a并找到下一个相邻的标签。从 所返回的数组中获取第一个元素find_next_siblings()。大概是这样的<td>4145</td>。从中清理 html 标签或从中提取字符串td,您应该获得端口号。for i in a:&nbsp; &nbsp; full = i.find_next_siblings()[0]&nbsp; &nbsp; port = str(full).replace("<td>","")&nbsp; &nbsp; port = str(port).replace("</td>", "")&nbsp; &nbsp; print(port)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python