BeautifulSoup 4：从不同的ptag提取多个标题和链接

首页课程实战体系课手记专栏慕课教程

BeautifulSoup 4：从不同的ptag提取多个标题和链接

HTML代码：

<div>

<a href="/news/123456">title_1</a>

</p>

</div>

<div>

<a href="/news/789000">title_2</a>

</p>

</div>

我的代码：

def web(WebUrl):

site = urlparse(WebUrl)

code = requests.get(WebUrl)

plain = code.text

s = BeautifulSoup(plain, "html.parser")

p_containers = s.find('p', {'class':'title'})

for title in s.find_all('p', {'class':'title'}):

line = title.get_text()

print(line)

for link in p_containers.find_all('a'):

line2 = link.get('href')

print(site.netloc + str(line2))

嗨，大家好，我需要一些帮助，我的任务是从网页中提取标题和链接，我能够提取标题而不是链接。当我尝试抓取链接时，我只成功抓取了第一个链接，以下链接被忽略并替换为第一个抓取的链接。

九州编程

浏览 260回答 2

2回答

HUX布斯

您的代码中有大部分位，但只有一点点错了。我认为获取标题和链接的最简单方法是使用以下内容。site = """<div>    <p class="title">       <a href="/news/123456">title_1</a>     </p></div><div>    <p class="title">       <a href="/news/789000">title_2</a>     </p></div>"""s = BeautifulSoup(site, "html.parser")for title in s.find_all('p', {'class':'title'}):    links = [x['href'] for x in title.find_all('a', href=True)]    line = title.get_text()    print(line)    print(links)您可以看到 links 对象是一个列表，以防万一每个标题都有多个链接。

0 0

慕码人8056858

尝试这种方式将有助于从中查找所有值。from bs4 import BeautifulSouptext = """<div>    <p class="title">       <a href="/news/123456">title_1</a>     </p></div><div>    <p class="title">       <a href="/news/789000">title_2</a>     </p></div>"""soup = BeautifulSoup(text, 'html.parser')for i in soup.find_all('p', attrs={'class': 'title'}):    link = None    if i.find('a'):        link = i.find('a').get('href')    print('Title:', i.get_text(strip=True), 'Link:', link)# Output as:# Title: title_1 Link: /news/123456# Title: title_2 Link: /news/789000

0 0

随时随地看视频慕课网APP