BeautifulSoap为具有特定类的div中的所有img获取多个元素

我试图在with下的标签中获取image-file属性(相对链接)中的链接(我不想要链接)。imgdivid previewImagessrc


这是示例 HTML:


<div id="previewImages">

  <div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>

  <div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div>

</div>

我尝试了以下操作,但它只给了我第一个链接,而不是全部:


import sys

import urllib2

from bs4 import BeautifulSoup


quote_page = sys.argv[1] # this should be the first argument on the command line

page = urllib2.urlopen(quote_page)

soup = BeautifulSoup(page, 'html.parser')


images_box = soup.find('div', attrs={'id': 'previewImages'})

if images_box.find('img'):

    imagesurl = images_box.find('img').get('image-file')

print imagesurl

如何获取image-fileattritube 中所有img标签的链接divwith class previewImages?


慕斯709654
浏览 277回答 3
3回答

潇湘沐

利用 .findAll前任:from bs4 import BeautifulSouphtml = """<div id="previewImages">&nbsp; <div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div></div>"""soup = BeautifulSoup(html, "html.parser")images_box = soup.find('div', attrs={'id': 'previewImages'})for link in images_box.findAll("img"):&nbsp; &nbsp; print link.get('image-file')输出:/image/15.jpg/image/2.jpg/image/0.jpg/image/3.jpg/image/4.jpg

萧十郎

我认为将 id 与传递给的属性选择器一起使用会更快 selectfrom bs4 import BeautifulSoup as bshtml = '''<div id="previewImages">&nbsp; <div class="thumb"> <a><img src="https://example.com/s/15.jpg" image-file="/image/15.jpg" /></a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/2.jpg" image-file="/image/2.jpg" /> </a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/0.jpg" image-file="/image/0.jpg" /> </a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/3.jpg" image-file="/image/3.jpg" /> </a> </div>&nbsp; <div class="thumb"> <a><img src="https://example.com/s/4.jpg" image-file="/image/4.jpg" /> </a> </div></div>'''soup = bs(html, 'lxml')links = [item['image-file'] for item in soup.select('#previewImages [image-file]')]print(links)

陪伴而非守候

如果我们对 lxml 执行相同的场景,则加起来,import lxml.htmltree = lxml.html.fromstring(sample)images = tree.xpath("//img/@image-file")print(images)输出 ['/image/15.jpg', '/image/2.jpg', '/image/0.jpg', '/image/3.jpg', '/image/4.jpg']
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python