使用 BeautifulSoup 计算 HTML 页面中的子字符串数

首页课程实战体系课手记专栏慕课教程

使用 BeautifulSoup 计算 HTML 页面中的子字符串数

我需要使用 BeautifulSoup 模块查找并计算所有“python”和“c++”单词作为 HTML 代码中的子字符串。在维基百科中，这些词相应地出现了 1 次和 9 次。为什么我的代码写0和0？

from urllib.request import urlopen, urlretrieve

from bs4 import BeautifulSoup

resp = urlopen("https://stepik.org/media/attachments/lesson/209717/1.html")

html = resp.read().decode('utf8')

soup = BeautifulSoup(html, 'html.parser')

table = soup.find('table', attrs = {'class' : 'wikitable sortable'})

cnt = 0

for tr in soup.find_all("python"):

cnt += 1

print(cnt)

cnt1 = 0

for tr in soup.find_all("c++"):

cnt += 1

print(cnt)

PIPIONE

浏览 127回答 1

1回答

慕码人8056858

你做错了你需要使用字符串参数来搜索任何字符串        # These will only work in case like these Python    soup.find_all(string="Python")    # Not in these python or Python is best    #We can use regex to fix that they will work in substring cases         soup.find_all(string=re.compile("[cC]\+\+"))     soup.find_all(string=re.compile("[Pp]ython"))

0 0

随时随地看视频慕课网APP