为什么 Python BeautifulSoup 返回一个空列表?

我是 IT 专业的菜鸟学生,我试图帮助我的朋友完成工作,我想创建一个他可以服务的客户列表(也许将其导出到文件中也很棒,但我稍后会考虑)猜测)。


当我尝试运行代码时,它只返回一个空列表,您有什么建议吗?


任何建议/反馈将不胜感激!


谢谢你!(我知道也许这不是您见过的最好的代码!所以我提前向自己道歉!)


import requests

from bs4 import BeautifulSoup

import pprint


res = requests.get('https://www.paginebianche.it/toscana/li/gommisti.html')

res2 = requests.get('https://www.paginebianche.it/ricerca?qs=gommisti&dv=li&p=2')

soup = BeautifulSoup(res.text, 'html.parser')

soup2 = BeautifulSoup(res2.text, 'html.parser')


links = soup.select('.org fn')

subtext = soup.select('.address')

links2 = soup2.select('.org fn')

subtext2 = soup2.select('.address')


mega_links = links + links2

mega_subtext = subtext + subtext2


def create_custom_hn(mega_links,mega_subtext):

  hn = []

  for links,address in enumerate(mega_links):

    title = links.getText()

    address= address.getText()

    hn.append({'title': title, 'address': address})

  return hn

 

pprint.pprint(create_custom_hn(mega_links,mega_subtext))


MYYA
浏览 113回答 1
1回答

忽然笑

选择器.org fn是错误的,应该.org.fn选择所有具有 classorg和 的元素fn。但是,有些项目没有.address这样,您的代码会产生倾斜的结果。您可以使用此示例来获取标题和地址(如果缺少地址,-则使用 ):import pprintimport requestsfrom itertools import chainfrom bs4 import BeautifulSoupres = requests.get('https://www.paginebianche.it/toscana/li/gommisti.html')res2 = requests.get('https://www.paginebianche.it/ricerca?qs=gommisti&dv=li&p=2')soup = BeautifulSoup(res.text, 'html.parser')soup2 = BeautifulSoup(res2.text, 'html.parser')hn = []for i in chain.from_iterable([soup.select('.item'), soup2.select('.item')]):    title = i.h2.getText(strip=True)    addr = i.select_one('[itemprop="address"]')    addr = addr.getText(strip=True, separator='\n') if addr else '-'    hn.append({'title': title, 'address': addr})    pprint.pprint(hn)印刷:[{'address': 'Via Don Giovanni Minzoni 44\n-\n57025\nPiombino (LI)',  'title': 'CENTROGOMMA'}, {'address': 'Via Quaglierini 14\n-\n57123\nLivorno (LI)',  'title': 'F.LLI CAPALDI'}, {'address': 'Via Ugione 9\n-\n57121\nLivorno (LI)',  'title': 'PNEUMATICI INTERGOMMA GOMMISTA'}, {'address': "Viale Carducci Giosue' 88/90\n-\n57124\nLivorno (LI)",  'title': 'ITALMOTORS'}, {'address': 'Piazza Chiesa 53\n-\n57124\nLivorno (LI)',  'title': 'Lo Coco Pneumatici'}, {'address': '-', 'title': 'PIERO GOMME'}, {'address': 'Via Pisana Livornese Nord 95\n-\n57014\nVicarello (LI)',  'title': 'GOMMISTA TRAVAGLINI PNEUMATICI'}, {'address': 'Via Cimarosa 165\n-\n57124\nLivorno (LI)',  'title': 'GOMMISTI CIONI AUTORICAMBI & SERVIZI'}, {'address': 'Loc. La Cerretella, 219\n-\n57022\nCastagneto Carducci (LI)',  'title': 'AURELIA GOMME'}, {'address': 'Strada Provinciale Vecchia Aurelia 243\n'             '-\n'             '57022\n'             'Castagneto Carducci (LI)',  'title': 'AURELIA GOMME DI GIANNELLI SIMONE'},...and so on.
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python