提取 src 属性

我想做的事:

这个 HTML 代码:


<img class="poster lazyload lazyloaded"

     data-src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"

     data-srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"

     alt="Hitman"

     src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"

     srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"

     data-loaded="true">

我想提取“data-src”或“src”(或包含图像 URL 的每个属性)属性值。


我试过的:

Posters = soup.find("img")["src"]

print(Posters)

但这显然会返回每个 img 标签的所有值,因此每个链接都与海报无关。输出:


https://www.themoviedb.org/assets/2/v4/logos/v2/blue_short-8e7b30f73a4020692ccca9c88bafe5dcb6f8a62a4c6bc55cd9ba82bb2cd95f6c.SVG

https://www.themoviedb.org/assets/2/v4/logos/v2/blue_short-8e7b30f73a4020692ccca9c88bafe5dcb6f8a62a4c6bc55cd9ba82bb2cd95f6c.SVG

对于海报,我指的是(检查此 URL https://www.themoviedb.org/search?&query=Hitman:)电影海报。


概括

我想在类“.lazyloaded”中提取属性内的值


我希望一切都清楚。谢谢。


慕田峪7331174
浏览 127回答 1
1回答

饮歌长啸

您可以尝试过滤class:posters&nbsp; = soup.find_all("img", {"class": "lazyloaded"})for poster in posters:&nbsp; &nbsp; print(poster["src"])请参阅文档:https ://www.crummy.com/software/BeautifulSoup/bs4/doc/#searching-by-css-class编辑:更多解释假设您有以下文件demo.html:<!DOCTYPE html><html><head>&nbsp; <meta charset="UTF-8">&nbsp; <title>Title</title></head><body><img class="logo" src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"><img class="poster lazyload lazyloaded"&nbsp; &nbsp; &nbsp;data-src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"&nbsp; &nbsp; &nbsp;data-srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"&nbsp; &nbsp; &nbsp;alt="Hitman"&nbsp; &nbsp; &nbsp;src="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg"&nbsp; &nbsp; &nbsp;srcset="https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 1x, https://image.tmdb.org/t/p/w188_and_h282_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg 2x"&nbsp; &nbsp; &nbsp;data-loaded="true"></body></html>您可以像这样解析“海报”图像:import iofrom bs4 import BeautifulSoupwith io.open("demo.html", encoding="utf8") as fd:&nbsp; &nbsp; soup = BeautifulSoup(fd.read(), features="html.parser")posters = soup.find_all("img", {"class": "lazyloaded"})for poster in posters:&nbsp; &nbsp; print(poster["src"])你得到:https://image.tmdb.org/t/p/w94_and_h141_bestv2/3qlQM9KP1cyvNfPChA9rASASdHr.jpg
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python