我正在尝试使用该findChildren()功能。我基本上想要所有<p>在特定<h3>标签下。我正在尝试一些简单的代码,但集合children。我要回来的是空的。h3返回正确的行(请参见print(h3)注释)和print(type(children))打印类型:<class 'bs4.element.ResultSet'>。请告诉我我在做什么错。
soup = BeautifulSoup(contents, 'html.parser')
h3 = soup.find('h3', text=re.compile('chapter', re.IGNORECASE))
print(h3) #result prints <h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
children = h3.findChildren('p')
print(type(children)) #returns type: <class 'bs4.element.ResultSet'>
我也试过h3.findChildren('p', Recursive=True)和children = h3.findChildren(Recursive=True)。里面也回来空了。
这是我要抓取的HTML部分:
<h3 style="text-align: center;">CHAPTER ONE - STEPHANUS GRAYLAND</h3>
<p dir="ltr" style="line-height: 1.15; margin-top: 0pt; margin-bottom: 0pt;">
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">Stephanus Grayland did not try to hide his smile of satisfaction . He had “eaten” lunch, but now, he sensed, he would truly </span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; font-style: italic; vertical-align: baseline; white-space: pre-wrap;">feast</span>
<span style="font-size: 16px; font-family: 'Times New Roman'; background-color: transparent; vertical-align: baseline; white-space: pre-wrap;">.</span>
</p>
<p></p>
相关分类