我从网站上抓取了 HTMl,需要获取其中的特定标签,问题是,它的格式令人困惑,我无法获取整个标签。让我举例说明:
data = """
<div class="Answer">
1. BOUNDARIES - EPB & APL <i>(inferior)</i>, EPL <i>(superior). </i><div>2. FLOOR (proximal to distal) - radial styloid => scaphoid => trapezium => 1st MC base. <br /><div>3. CONTENTS - cutaneous branches of radial nerve <i>(on the roof),</i> cephalic vein <i>(begins here),</i> radial artery <i>(on the floor).</i></div></div><div><br /></div><div><img src="paste-27a44c801f0776d91f5f6a16a963bff67f0e8ef3.jpg" /><br /></div><div><b>Image: </b>Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].</div>
</div>
"""
从上面,我只想得到这个:
<div><b>Image: </b>Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].</div>
我写了以下代码:
soup = BeautifulSoup(data, "html.parser")
image_link = soup.find('div').find('b').next.next
print(image_link)
但它只能让我得到文本:
Case courtesy of Dr Sachintha Hapugoda, <a href="https://radiopaedia.org/">Radiopaedia.org</a>. From the case <a href="https://radiopaedia.org/cases/52525">rID: 52525</a> [Accessed 15 Nov. 2018].
我如何获得整个标签?
相关分类