猿问

使用 BeautifulSoup 获取跨度之间的文本

我正在尝试使用 Python 中的 BeautifulSoup 抓取各种网站。假设我有以下html摘录:


<div class="member_biography">

<h3>Biography</h3>

<span class="sub_heading">District:</span> AnyState - At Large<br/>

<span class="sub_heading">Political Highlights:</span> AnyTown City Council, 19XX-XX<br/>

<span class="sub_heading">Born:</span> June X, 19XX; AnyTown, Calif.<br/>

<span class="sub_heading">Residence:</span> Some Town<br/>

<span class="sub_heading">Religion:</span> Episcopalian<br/>

<span class="sub_heading">Family:</span> Wife, Some Name; two children<br/>

<span class="sub_heading">Education:</span> Some State College, A.A. 19XX; Some Other State College, B.A. 19XX<br/>

<span class="sub_heading">Elected:</span> 19XX<br/>

</div>

我需要结果采用以下格式:


District:              AnyState - At Large

Political Highlights:  AnyTown City Council, 19XX-XX

Born:                  June X, 19XX; AnyTown, Calif.

Residence:             Some Town

Religion:              Episcopalian

Family:                Wife, Some Name; two children

Education:             Some State College, A.A. 19XX; Some Other State College, B.A. 19XX

Elected:               19XX

但是,到目前为止,我只能实现以下目标:


District:

Political Highlights:

Born:

Residence:

Religion:

Family:

Education:

Elected:

使用以下代码:


import urllib.request

import sys

from bs4 import BeautifulSoup


def main(url):

    fp = urllib.request.urlopen(url)

    site_bytearray = fp.read()

    fp.close()


    #bs_data = BeautifulSoup(site_str,features="html.parser")

    bs_data = BeautifulSoup(site_bytearray,'lxml')

    tmplist = bs_data.find_all('span',{'class':'sub_heading'})

    for item in tmplist:

        print(item.text)

    sys.exit(0)


if __name__ == "__main__":

    main(sys.argv[1])

总之,我如何提取District和AnyState - At Large从<span class="sub_heading">District:</span> AnyState - At Large<br/>在作进一步处理列表积累的结果?


慕的地6264312
浏览 85回答 2
2回答

慕桂英546537

将您的打印命令替换为:Python 3.6+:print(f'{item.text:<25} {item.next_sibling}')&nbsp;Python 3 - 3.5:print('{:<25} {}'.format(item.text, item.next_sibling))输出:District:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; AnyState - At LargePolitical Highlights:&nbsp; &nbsp; &nbsp; AnyTown City Council, 19XX-XXBorn:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; June X, 19XX; AnyTown, Calif.Residence:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Some TownReligion:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; EpiscopalianFamily:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Wife, Some Name; two childrenEducation:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Some State College, A.A. 19XX; Some Other State College, B.A. 19XXElected:&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;19XX
随时随地看视频慕课网APP

相关分类

Python
我要回答