Python 3 Beautifulsoup:获取带有特定文本的span标签值,该文本也随机放置在

我尝试在这里搜索这个,但老实说找不到答案,因为这应该很容易用 Selenium 来完成,但由于性能是一个重要因素,所以我正在考虑用 Beautifulsoup 来代替。


场景:我需要抓取根据用户输入以随机方式生成的不同商品的价格,请参见下面的代码:


<div class="sk-expander-content" style="display: block;">


<ul>

  <li>

    <span>Third Party Liability</span>

    <span>€756.62</span>

  </li>


  <li>

  <span>Fire &amp; Theft</span>

  <span>€15.59</span>

  </li>


</ul>

</div>

如果这些选项是静态的并且总是显示在 html 中的相同位置,那么很容易抓取价格,但由于这些选项可以放置在 中的任何位置div sk-expander-content,我不确定如何以动态方式找到它们。


最好的方法是编写一个方法来传递我们正在查找的范围文本并返回欧元值。跨度标签的结构始终相同,第一个跨度始终是商品名称,第二个跨度始终是价格。


我首先想到的是下面的代码,但我不确定这是否足够强大或者是否有意义:


html = driver.page_source

soup = BeautifulSoup(html, "html.parser")


div_i_need = soup.find_all("div", class_="sk-expander-content")[1]


def price_scraper(text_to_find):

    for el in div_i_need.find_all(['ul', 'li', 'span']):

        if el.name == 'span':

            if el[0].text == text_to_find:

                return(el[1].text)

我们将非常感谢您的帮助。


呼如林
浏览 137回答 2
2回答

猛跑小猪

使用正则表达式。import rehtml='''<div class="sk-expander-content" style="display: block;"><ul>&nbsp; <li>&nbsp; &nbsp; <span>Third Party Liability</span>&nbsp; &nbsp; <span>€756.62</span>&nbsp; </li>&nbsp; <li>&nbsp; <span>Fire &amp; Theft</span>&nbsp; <span>€15.59</span>&nbsp; </li></ul></div><div class="sk-expander-content" style="display: block;"><ul>&nbsp; <li>&nbsp; &nbsp; <span>Fire &amp; Theft</span>&nbsp; &nbsp; <span>€756.62</span>&nbsp; </li>&nbsp; <li>&nbsp; <span>Third Party Liability</span>&nbsp;&nbsp; <span>€15.59</span>&nbsp; </li></ul></div>'''soup = BeautifulSoup(html, "html.parser")for item in soup.find_all(class_="sk-expander-content"):&nbsp; &nbsp; for span in item.find_all('span',text=re.compile("€(\d+).(\d+)")):&nbsp; &nbsp; &nbsp; &nbsp; print(span.find_previous_sibling('span').text)&nbsp; &nbsp; &nbsp; &nbsp; print(span.text)输出:Third Party Liability€756.62Fire & Theft€15.59Fire & Theft€756.62Third Party Liability€15.59更新:如果您想获取第一个节点值。然后使用find()而不是find_all()。import rehtml='''<div class="sk-expander-content" style="display: block;"><ul>&nbsp; <li>&nbsp; &nbsp; <span>Third Party Liability</span>&nbsp; &nbsp; <span>€756.62</span>&nbsp; </li>&nbsp; <li>&nbsp; <span>Fire &amp; Theft</span>&nbsp; <span>€15.59</span>&nbsp; </li></ul></div><div class="sk-expander-content" style="display: block;"><ul>&nbsp; <li>&nbsp; &nbsp; <span>Fire &amp; Theft</span>&nbsp; &nbsp; <span>€756.62</span>&nbsp; </li>&nbsp; <li>&nbsp; <span>Third Party Liability</span>&nbsp;&nbsp; <span>€15.59</span>&nbsp; </li></ul></div>'''soup = BeautifulSoup(html, "html.parser")for span in soup.find(class_="sk-expander-content").find_all('span',text=re.compile("€(\d+).(\d+)")):&nbsp; &nbsp; print(span.find_previous_sibling('span').text)&nbsp; &nbsp; print(span.text)

慕盖茨4494581

from bs4 import BeautifulSoupimport rehtml = """<div class="sk-expander-content" style="display: block;"><ul>&nbsp; <li>&nbsp; &nbsp; <span>Third Party Liability</span>&nbsp; &nbsp; <span>€756.62</span>&nbsp; </li>&nbsp; <li>&nbsp; <span>Fire &amp; Theft</span>&nbsp; <span>€15.59</span>&nbsp; </li></ul></div>"""soup = BeautifulSoup(html, 'html.parser')target = soup.select("div.sk-expander-content")for tar in target:&nbsp; &nbsp; data = [item.text for item in tar.findAll("span", text=re.compile("€"))]&nbsp; &nbsp; print(data)输出:['€756.62', '€15.59']注意:我使用了selectwhich returnResultSet来查找所有div.
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Html5