猿问

python爬虫 设置了去重但是还是爬几条后就无限爬同一个URL,下面是去重的代码

class UrlManager(object):
   def __init__(self):
       self.new_urls = set()
       self.old_urls = set()

   def add_new_url(self, url):
       if url is None:


           return
       if url not in self.new_urls and url not in self.old_urls:
           self.new_urls.add(url)

   def add_new_urls(self, urls):
       if urls is None or len(urls) == 0:
           return
       for url in urls:
           self.new_urls.add(url)

   def has_new_url(self):
       return len(self.new_urls) != 0

   def get_new_url(self):
       new_url = self.new_urls.pop()
       self.old_urls.add(new_url)

       return new_url

慕桂英5878391
浏览 1531回答 0
0回答
随时随地看视频慕课网APP

相关分类

Python
我要回答