具有相同类属性的表格抓取

我试图从 www.hujjat.org 的网站上抓取祈祷时间。


这是我感兴趣的区域的 html 部分(您可能已经注意到所有 4 个祈祷的 class 属性都相同):


<table width="100%">

    <tbody>

        <tr>

            <td class="NamaazTimes">

                <div class="NamaazTimeName">Fajr</div>

                <div class="NamaazTime">04:42</div>

            </td>

            <td class="NamaazTimes">

                <div class="NamaazTimeName">Sunrise</div>

                <div class="NamaazTime">06:32</div>

            </td>

            <td class="NamaazTimes">

                <div class="NamaazTimeName">Zohr</div>

                <div class="NamaazTime">13:02</div>

            </td>

            <td class="NamaazTimes">

                <div class="NamaazTimeName">Maghrib</div>

                <div class="NamaazTime">19:33</div>

            </td>

        </tr>

    </tbody>

</table>

到目前为止,我已经编写了以下代码:


# import libraries

import json

import urllib2

from bs4 import BeautifulSoup

# specify the url

quote_page = 'http://www.hujjat.org/'

# query the website and return the html to the variable 'page'

page = urllib2.urlopen(quote_page)

# parse the html using beautiful soap and store in variable 'soup'

soup = BeautifulSoup(page, 'html.parser')


table = soup.find("div",class_="NamaazTimeName", text="Fajr").find_previous("table")

for row in table.find_all("tr"):

    a = row.find_all("td")


   # print(row.find_all("td"))


print (a)

我的结果是:


[<td class="NamaazTimes">\n<div class="NamaazTimeName">Fajr</div>\n<div class="NamaazTime">04:42</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Sunrise</div>\n<div class="NamaazTime">06:32</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Zohr</div>\n<div class="NamaazTime">13:02</div>\n</td>, <td class="NamaazTimes">\n<div class="NamaazTimeName">Maghrib</div>\n<div class="NamaazTime">19:33</div>\n</td>]

我想从我的代码中得到的只是每个祈祷的时间,例如,如果是“Fajr”祈祷,那么输出应该是“04:42”。然后我想将这个“04:42”保存在一个文本文件中。


有谁可以帮助我吗?


海绵宝宝撒
浏览 179回答 3
3回答

慕工程0101907

&nbsp; &nbsp; from bs4 import BeautifulSoup&nbsp; &nbsp; import pandas as pd&nbsp; &nbsp; data = BeautifulSoup(#HTML data)&nbsp; &nbsp; NamaazName = data.find_all('div', {'class':'NamaazTimeName'})&nbsp; &nbsp; NamaazTime = data.find_all('div', {'class':'NamaazTime'})&nbsp; &nbsp; for i in range(len(NamaazName)):&nbsp; &nbsp; &nbsp; &nbsp; coll[NamaazName[i].text] = NamaazTime[i].text&nbsp; &nbsp; master_data.columns=pd.DataFrame()&nbsp; &nbsp; master_data['NamaazName'] = coll.keys()&nbsp; &nbsp; master_data['NamaazTime'] = coll.values()&nbsp; &nbsp;print(master_data)输出&nbsp; &nbsp; Nammaz&nbsp; NammazTime0&nbsp; &nbsp; Fajr&nbsp; &nbsp; &nbsp;04:42&nbsp;1&nbsp; &nbsp; Sunrise&nbsp; 06:32&nbsp;2&nbsp; &nbsp; Zohr&nbsp; &nbsp; &nbsp;13:02&nbsp;3&nbsp; &nbsp; Maghrib&nbsp; 19:33&nbsp;
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python