我是美丽汤的新手,我试图从这个网站中提取一些原始数据,我做了解析。
from urllib.request import urlopen
from bs4 import BeautifulSoup
path='https://www.esquire.com/entertainment/tv/g28380481/best-anime-2019/'
f = urlopen(path)
html = str(f.read())
soup = BeautifulSoup(html, 'html.parser')
txt = soup.find_all('iframe')
我得到了这个bs4对象
[<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/6M7f41OJfcM?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/0glqBjvku84?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/YKJf876thxw?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/SdFgPGSmy0Y?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/Ie-bo3IulmY?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/ApLudqucq-s?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/FpRk3m3Y-Zg?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/J9tu253SOas?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/lCPf9SA4mgU?enablejsapi=1" frameborder="0"></iframe>,
<iframe allowfullscreen="true" data-src="//www.youtube.com/embed/neqxQdpTyXE?enablejsapi=1" frameborder="0"></iframe>]
现在我想从每个元素中提取网站,我已经尝试了下面的代码。我会知道要使用哪个美丽汤命令,而不是将每个元素替换为字符串a进行搜索。
import re
trailers=[]
pattern='(www.+1)'
for line in txt:
line=str(line)
trailers.append(re.search(pattern,line).group(0))
函数式编程
相关分类