如何使用美丽的汤从 HTML 中提取数据

首页课程实战体系课手记专栏慕课教程

我正在尝试抓取网页并将结果存储在 csv/excel 文件中。我用的是漂亮的汤。

我正在尝试使用 find_all 函数从汤中提取数据，但我不确定如何捕获字段名称或标题中的数据

HTML 文件具有以下格式

<a class="font20 c_name_head weight700 detail_page"

href="/companies/view/1033/nimblechapps-pvt-ltd" target="_blank"

title="Nimblechapps Pvt. Ltd.">

<span itemprop="name">Nimblechapps Pvt. Ltd. </span>

</a> </h3>

到目前为止，这是我的代码。不知道如何从这里开始

from bs4 import BeautifulSoup as BS

import requests

page = 'https://www.goodfirms.co/directory/platform/app-development/iphone?

page=2'

res = requests.get(page)

cont = BS(res.content, "html.parser")

names = cont.find_all(class_ = 'font20 c_name_head weight700 detail_page')

names = cont.find_all('a' , attrs = {'class':'font20 c_name_head weight700

detail_page'})

我曾尝试使用以下 -

Input: cont.h3.a.span

Output: <span itemprop="name">Nimblechapps Pvt. Ltd.</span>

我想提取公司名称 - “Nimblechapps Pvt. Ltd.”

陪伴而非守候

浏览 204回答 3

随时随地看视频慕课网APP