使用 BeautifulSoup 计算 HTML 页面中的子字符串数

我需要使用 BeautifulSoup 模块查找并计算所有“python”和“c++”单词作为 HTML 代码中的子字符串。在维基百科中,这些词相应地出现了 1 次和 9 次。为什么我的代码写0和0?


from urllib.request import urlopen, urlretrieve


from bs4 import BeautifulSoup



resp = urlopen("https://stepik.org/media/attachments/lesson/209717/1.html") 


html = resp.read().decode('utf8') 


soup = BeautifulSoup(html, 'html.parser') 


table = soup.find('table', attrs = {'class' : 'wikitable sortable'})


cnt = 0


for tr in soup.find_all("python"):


    cnt += 1


print(cnt)


cnt1 = 0


for tr in soup.find_all("c++"):


    cnt += 1


print(cnt)


PIPIONE
浏览 76回答 1
1回答

慕码人8056858

你做错了你需要使用字符串参数来搜索任何字符串&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; # These will only work in case like these <b>Python</b>&nbsp; &nbsp; soup.find_all(string="Python")&nbsp; &nbsp; # Not in these <b>python</b> or <b>Python is best</b>&nbsp; &nbsp; #We can use regex to fix that they will work in substring cases&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp; soup.find_all(string=re.compile("[cC]\+\+"))&nbsp;&nbsp; &nbsp; soup.find_all(string=re.compile("[Pp]ython"))
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python