我的网络爬虫陷入了循环

首页课程实战体系课手记专栏慕课教程

我正在制作一个网络爬虫，当我尝试抓取一页数据时，它会不断加载相同的信息。

from urllib.request import urlopen as uReq

from bs4 import BeautifulSoup as soup

my_url = 'https://www.realtor.com/realestateagents/phoenix_az'

#opening up connection, grabbing the page

uClient = uReq(my_url)

#read page

page_html = uClient.read()

#close page

uClient.close()

#html parsing

page_soup = soup(page_html, "html.parser")

#finds all realtors on page

containers = page_soup.findAll("div",{"class":"agent-list-card clearfix"})

for container in containers:

name = page_soup.find('div', class_='agent-name text-bold')

agent_name = name.text.strip()

number = page_soup.find('div', class_='agent-phone hidden-xs hidden-xxs')

agent_number = number.text.strip()

print("name: " + agent_name)

print("number: " + agent_number)

慕的地10843

浏览 55回答 1

红颜莎娜

解决办法是在循环内搜索container而不是在page_soup循环内搜索。此外，您应该检查是否有结果或捕获抛出的异常。

0 0

随时随地看视频慕课网APP