从股票网站页面提取特定的字符串匹配

我正在尝试使用下面的代码来抓取股票市值。起初我传统上尝试获取使用 bs4 的列表market cap values。当我以前 print(x.find('span',{'class': 'Trsdu(0.3s)'}).text)这样做时,我遇到了AttributeError: 'NoneType' object has no attribute 'text'错误。


  for x in marketCapArray:

        print(x.find('span',{'class': 'Trsdu(0.3s)'}).text)

我不知道如何解决特定于我的代码的上述错误。因此,我采取了另一种方法,使用正则表达式来简单地提取所需的值,并在下面进行了尝试。


主要代码


import bs4

import re

import requests

from bs4 import BeautifulSoup

from urllib.request import urlopen


def pickTopGainers():

  url =  'https://in.finance.yahoo.com/gainers?offset=0&count=100'

  page = urlopen(url)

  soup = bs4.BeautifulSoup(page,"html.parser")

  marketCapArray = soup.find_all('td', {'class': 'Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)',

 'aria-label': 'Market cap'})

  print(str(marketCapArray))

  xi = re.findall("........</span>", str(marketCapArray)) # regex-use-1

  pi = re.sub("(</span>|....>N/A|>|\")","", str(xi))

  print(pi)


pickTopGainers()

结果


这就是print(str(marketCapArray)会输出的内容。(只粘贴了一部分)


[<td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="93"><span class="Trsdu(0.3s)" data-reactid="94">159.404M</span></td>, 

<td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="119"><span class="Trsdu(0.3s)" data-reactid="120">533.97M</span></td>, 

<td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="145"><span data-reactid="146">N/A</span></td>, 

<td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="171"><span class="Trsdu(0.3s)" data-reactid="172">2.952B</span></td>, 

<td aria-label="Market cap" class="Va(m) Ta(end) Pstart(20px) Pend(10px) W(120px) Fz(s)" colspan="" data-reactid="197"><span class="Trsdu(0.3s)" data-reactid="198">9.223B</span></td>, 

这是 的输出print(pi)。也是最终的输出。


['159.404M', '533.97M', '', '2.952B', '9.223B', '']


问题

如何避免在上面使用正则表达式替换(re.sub)Main Code来实现给定的最终输出pi?或者建议我正确的方法来做到这一点。我觉得我的正则表达式令人不快。


临摹微笑
浏览 105回答 1
1回答

qq_遁去的一_1

<table>您可以在存储所有信息的 内逐行迭代。例如:import requestsfrom bs4 import BeautifulSoupurl = 'https://in.finance.yahoo.com/gainers?offset=0&count=100'soup = BeautifulSoup(requests.get(url).content, 'html.parser')fmt_string = '{:<15} {:<60} {:<10} {:<10} {:<10} {:<10} {:<10} {:<10} {:<10}'print(fmt_string.format('Symbol', 'Name', 'Price(int)', 'Change', '% change', 'Volume', 'AvgVol(3M)', 'Market Cap', 'PE ratio'))for row in soup.select('table:has(a[href*="/quote/"]) > tbody > tr'):&nbsp; &nbsp; cells = [td.get_text(strip=True) for td in row.select('td')]&nbsp; &nbsp; print(fmt_string.format(*cells[:-1]))印刷:Symbol&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Name&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Price(int) Change&nbsp; &nbsp; &nbsp;% change&nbsp; &nbsp;Volume&nbsp; &nbsp; &nbsp;AvgVol(3M) Market Cap PE ratio&nbsp;&nbsp;CCCL.NS&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Consolidated Construction Consortium Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.2000&nbsp; &nbsp; &nbsp;+0.0500&nbsp; &nbsp; +33.33%&nbsp; &nbsp; 57,902&nbsp; &nbsp; &nbsp;290,154&nbsp; &nbsp; 159.404M&nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp;KSERASERA.NS&nbsp; &nbsp; KSS Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 0.2500&nbsp; &nbsp; &nbsp;+0.0500&nbsp; &nbsp; +25.00%&nbsp; &nbsp; 1.607M&nbsp; &nbsp; &nbsp;2.601M&nbsp; &nbsp; &nbsp;533.97M&nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp;BONLON.BO&nbsp; &nbsp; &nbsp; &nbsp;BONLON INDUSTRIES LIMITED&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 21.60&nbsp; &nbsp; &nbsp; +3.60&nbsp; &nbsp; &nbsp; +20.00%&nbsp; &nbsp; 16,000&nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp;MENONBE.NS&nbsp; &nbsp; &nbsp; Menon Bearings Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;52.80&nbsp; &nbsp; &nbsp; +8.80&nbsp; &nbsp; &nbsp; +20.00%&nbsp; &nbsp; 2.334M&nbsp; &nbsp; &nbsp;65,713&nbsp; &nbsp; &nbsp;2.952B&nbsp; &nbsp; &nbsp;25.05&nbsp; &nbsp; &nbsp;RPOWER.NS&nbsp; &nbsp; &nbsp; &nbsp;Reliance Power Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3.3000&nbsp; &nbsp; &nbsp;+0.5500&nbsp; &nbsp; +20.00%&nbsp; &nbsp; 127.814M&nbsp; &nbsp;18.439M&nbsp; &nbsp; 9.223B&nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp;11DPD.BO&nbsp; &nbsp; &nbsp; &nbsp; Nippon India Mutual Fund&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0.0600&nbsp; &nbsp; &nbsp;+0.0100&nbsp; &nbsp; +20.00%&nbsp; &nbsp; 190&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp;ABFRLPP-E1.NS&nbsp; &nbsp;Aditya Birla Rs.5 ppd up&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;105.65&nbsp; &nbsp; &nbsp;+17.60&nbsp; &nbsp; &nbsp;+19.99%&nbsp; &nbsp; 1.238M&nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp;500110.BO&nbsp; &nbsp; &nbsp; &nbsp;Chennai Petroleum Corporation Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 64.55&nbsp; &nbsp; &nbsp; -0.15&nbsp; &nbsp; &nbsp; -0.23%&nbsp; &nbsp; &nbsp;42,765&nbsp; &nbsp; &nbsp;61,584&nbsp; &nbsp; &nbsp;9.612B&nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp;ABFRLPP.BO&nbsp; &nbsp; &nbsp; Aditya Birla Fashion and Retai&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;106.05&nbsp; &nbsp; &nbsp;+17.65&nbsp; &nbsp; &nbsp;+19.97%&nbsp; &nbsp; 387,703&nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp; N/A&nbsp; &nbsp; &nbsp; &nbsp;RADIOCITY.NS&nbsp; &nbsp; Music Broadcast Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 21.35&nbsp; &nbsp; &nbsp; +3.55&nbsp; &nbsp; &nbsp; +19.94%&nbsp; &nbsp; 12.657M&nbsp; &nbsp; 1.013M&nbsp; &nbsp; &nbsp;7.38B&nbsp; &nbsp; &nbsp; 124.13&nbsp; &nbsp;&nbsp;RADIOCITY.BO&nbsp; &nbsp; Music Broadcast Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 21.35&nbsp; &nbsp; &nbsp; +3.55&nbsp; &nbsp; &nbsp; +19.94%&nbsp; &nbsp; 898,070&nbsp; &nbsp; 90,236&nbsp; &nbsp; &nbsp;7.38B&nbsp; &nbsp; &nbsp; 124.13&nbsp; &nbsp;&nbsp;MENONBE.BO&nbsp; &nbsp; &nbsp; Menon Bearings Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;52.65&nbsp; &nbsp; &nbsp; +8.75&nbsp; &nbsp; &nbsp; +19.93%&nbsp; &nbsp; 137,065&nbsp; &nbsp; 8,648&nbsp; &nbsp; &nbsp; 2.951B&nbsp; &nbsp; &nbsp;24.98&nbsp; &nbsp; &nbsp;MTNL.BO&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Mahanagar Telephone Nigam Limited&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 10.72&nbsp; &nbsp; &nbsp; +1.78&nbsp; &nbsp; &nbsp; +19.91%&nbsp; &nbsp; 1.142M&nbsp; &nbsp; &nbsp;156,275&nbsp; &nbsp; 6.754B&nbsp; &nbsp; &nbsp;N/A&nbsp; &nbsp; &nbsp; &nbsp;...and so on.
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python