如何从此源代码中提取信息。我想从此链接中提取 Name , address,course

我想从此代码中提取姓名、地址、课程、机构类型。我猜是因为桌子的缘故,我做不到。每次我尝试它都会给我一个空白列表。我不知道该怎么办


<div class="row">

  <div class="col-md-12">

    <div class="panel panel-default">

      <div class="panel-body ">

        <div class="row">

          <div id="ContentPlaceHolder1_pnldefault">


            <table id="ContentPlaceHolder1_dlstCollege" class="table table-bordered table responsive" cellspacing="0" style="border-collapse:collapse;">

              <tr>

                <td>

                  <input type="hidden" name="ctl00$ContentPlaceHolder1$dlstCollege$ctl00$hdnInstituteId" id="ContentPlaceHolder1_dlstCollege_hdnInstituteId_0" value="968  " />

                  <a id="ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0" href="CollegeDetailedInformation.aspx?Inst=968  ">**A R INSTITUTE OF PHARMACY , BIJNOR (968)**</a>

                  <br />

                  <b>Location:</b>

                  <span id="ContentPlaceHolder1_dlstCollege_lblAddress_0">**TAJPUR** </span>

                  <br />

                  <b>Course:</b>

                  <span id="ContentPlaceHolder1_dlstCollege_lblCourse_0">**B.Pharm**,</span>

                  <br />

                  <b>Category:</b>

                  <span id="ContentPlaceHolder1_dlstCollege_lblInstituteType_0">**Private**</span>

                  <br />

                  <b>Web Address:</b>

                  <a id="lnkBtnWebURL" href='' target="_blank"></a>

                  <br />

                </td>

              </tr>


res = requests.get('http://kyc.aktu.ac.in/')

soup = BeautifulSoup(res.content, 'html.parser')

weblinks = soup.find_all('a', attrs = {'id':'ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0'})

pagelinks = []

for link in weblinks:     

 link = link.find('a') 

 pagelinks.append(link.get('href'))


qq_笑_17
浏览 96回答 1
1回答

慕妹3242003

试试这个:from bs4 import BeautifulSoup as bshtml = '<div class="row"><div class="col-md-12"><div class="panel panel-default"><div class="panel-body "><div class="row"><div id="ContentPlaceHolder1_pnldefault"><table id="ContentPlaceHolder1_dlstCollege" class="table table-bordered table responsive" cellspacing="0" style="border-collapse:collapse;"><tr><td><input type="hidden" name="ctl00$ContentPlaceHolder1$dlstCollege$ctl00$hdnInstituteId" id="ContentPlaceHolder1_dlstCollege_hdnInstituteId_0" value="968&nbsp; " /><a id="ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0" href="CollegeDetailedInformation.aspx?Inst=968&nbsp; ">**A R INSTITUTE OF PHARMACY , BIJNOR (968)**</a><br /><b>Location:</b><span id="ContentPlaceHolder1_dlstCollege_lblAddress_0">**TAJPUR** </span><br /><b>Course:</b><span id="ContentPlaceHolder1_dlstCollege_lblCourse_0">**B.Pharm**,</span><br /><b>Category:</b><span id="ContentPlaceHolder1_dlstCollege_lblInstituteType_0">**Private**</span><br /><b>Web Address:</b><a id="lnkBtnWebURL" href='' target="_blank"></a><br /></td></tr>'soup = bs(html , 'lxml')name = soup.find('a', id='ContentPlaceHolder1_dlstCollege_hlpkInstituteName_0').text.strip()address = soup.find('span', id= 'ContentPlaceHolder1_dlstCollege_lblAddress_0').text.strip()course = soup.find('span', id = 'ContentPlaceHolder1_dlstCollege_lblCourse_0').text.strip()institute_type = soup.find('span', id = 'ContentPlaceHolder1_dlstCollege_lblInstituteType_0').text.strip()print(name)print(address)print(course)print(institute_type)输出:**A R INSTITUTE OF PHARMACY , BIJNOR (968)****TAJPUR****B.Pharm**,**Private**
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python