BeautifulSoup 4:从不同的ptag提取多个标题和链接

HTML代码:


<div>

    <p class="title">

       <a href="/news/123456">title_1</a> 

    </p>

</div>


<div>

    <p class="title">

       <a href="/news/789000">title_2</a> 

    </p>

</div>

我的代码:


def web(WebUrl):

    site = urlparse(WebUrl)

    code = requests.get(WebUrl)

    plain = code.text

    s = BeautifulSoup(plain, "html.parser")

    p_containers = s.find('p', {'class':'title'})


    for title in s.find_all('p', {'class':'title'}):

        line = title.get_text()

        print(line)

        for link in p_containers.find_all('a'):

            line2 = link.get('href')

            print(site.netloc + str(line2))

嗨,大家好,我需要一些帮助,我的任务是从网页中提取标题和链接,我能够提取标题而不是链接。当我尝试抓取链接时,我只成功抓取了第一个链接,以下链接被忽略并替换为第一个抓取的链接。


九州编程
浏览 202回答 2
2回答

HUX布斯

您的代码中有大部分位,但只有一点点错了。我认为获取标题和链接的最简单方法是使用以下内容。site = """<div>&nbsp; &nbsp; <p class="title">&nbsp; &nbsp; &nbsp; &nbsp;<a href="/news/123456">title_1</a>&nbsp;&nbsp; &nbsp; </p></div><div>&nbsp; &nbsp; <p class="title">&nbsp; &nbsp; &nbsp; &nbsp;<a href="/news/789000">title_2</a>&nbsp;&nbsp; &nbsp; </p></div>"""s = BeautifulSoup(site, "html.parser")for title in s.find_all('p', {'class':'title'}):&nbsp; &nbsp; links = [x['href'] for x in title.find_all('a', href=True)]&nbsp; &nbsp; line = title.get_text()&nbsp; &nbsp; print(line)&nbsp; &nbsp; print(links)您可以看到 links 对象是一个列表,以防万一每个标题都有多个链接。

慕码人8056858

尝试这种方式将有助于从中查找所有值。from bs4 import BeautifulSouptext = """<div>&nbsp; &nbsp; <p class="title">&nbsp; &nbsp; &nbsp; &nbsp;<a href="/news/123456">title_1</a>&nbsp;&nbsp; &nbsp; </p></div><div>&nbsp; &nbsp; <p class="title">&nbsp; &nbsp; &nbsp; &nbsp;<a href="/news/789000">title_2</a>&nbsp;&nbsp; &nbsp; </p></div>"""soup = BeautifulSoup(text, 'html.parser')for i in soup.find_all('p', attrs={'class': 'title'}):&nbsp; &nbsp; link = None&nbsp; &nbsp; if i.find('a'):&nbsp; &nbsp; &nbsp; &nbsp; link = i.find('a').get('href')&nbsp; &nbsp; print('Title:', i.get_text(strip=True), 'Link:', link)# Output as:# Title: title_1 Link: /news/123456# Title: title_2 Link: /news/789000
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python