无法从网页中获取不同供应商的名称

我在 python 中创建了一个脚本,使用发布请求从网页中获取不同供应商的名称,但不幸的是,我遇到了这个错误AttributeError: 'NoneType' object has no attribute 'text',而我发现我以正确的方式做事。

网站链接

要填充内容,需要像在图像中看到的那样单击搜索按钮。

http://img1.mukewang.com/61e687930001de1009300106.jpg

到目前为止我已经尝试过:


import requests

from bs4 import BeautifulSoup


url = "https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml"


r = requests.get(url)

soup = BeautifulSoup(r.text,"lxml")


payload = {

    'contentForm': 'contentForm',

    'contentForm:j_idt225_listButton2_HIDDEN-INPUT': '',

    'contentForm:j_idt161_inputText': '',

    'contentForm:j_idt164_SEARCH': '',

    'contentForm:j_idt167_selectManyMenu_SEARCH-INPUT': '',

    'contentForm:j_idt167_selectManyMenu-HIDDEN-INPUT': '',

    'contentForm:j_idt167_selectManyMenu-HIDDEN-ACTION-INPUT': '',

    'contentForm:search': 'Search',

    'contentForm:j_idt185_select': 'SUPPLIER_NAME',

    'javax.faces.ViewState': soup.select_one('[id="javax.faces.ViewState"]')['value']

}


res = requests.post(url,data=payload,headers={

    'Content-Type': 'application/x-www-form-urlencoded',

    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'

    })

sauce = BeautifulSoup(res.text,"lxml")

item = sauce.select_one(".form2_ROW").text

print(item)

只有这部分也可以: 8121 results found.


完整追溯:


Traceback (most recent call last):

  File "C:\Users\WCS\AppData\Local\Programs\Python\Python37-32\general_demo.py", line 27, in <module>

    item = sauce.select_one(".form2_ROW").text

AttributeError: 'NoneType' object has no attribute 'text'


慕姐8265434
浏览 183回答 1
1回答

墨色风雨

您需要找到获取 cookie 的方法。以下内容目前适用于我的多个请求。import requestsfrom bs4 import BeautifulSoupurl = "https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml"headers = {&nbsp; &nbsp; 'Content-Type': 'application/x-www-form-urlencoded',&nbsp; &nbsp; 'User-Agent': 'Mozilla/5.0',&nbsp; &nbsp; 'Referer' : 'https://www.gebiz.gov.sg/ptn/supplier/directory/index.xhtml',&nbsp; &nbsp; 'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',&nbsp; &nbsp; 'Accept-Encoding' : 'gzip, deflate, br',&nbsp; &nbsp; 'Accept-Language' : 'en-US,en;q=0.9',&nbsp; &nbsp; 'Cache-Control' : 'max-age=0',&nbsp; &nbsp; 'Connection' : 'keep-alive',&nbsp; &nbsp; 'Cookie' : '__cfduid=d3fe47b7a0a7f3ef307c266817231b5881555951761; wlsessionid=pFpF87sa9OCxQhUzwQ3lXcKzo04j45DP3lIVYylizkFMuIbGi6Ka!1395223647; BIGipServerPTN2_PRD_Pool=52519072.47873.0000'}with requests.Session() as s:&nbsp; &nbsp; r = s.get(url, headers= headers)&nbsp; &nbsp; soup = BeautifulSoup(r.text,"lxml")&nbsp; &nbsp; payload = {&nbsp; &nbsp; &nbsp; &nbsp; 'contentForm': 'contentForm',&nbsp; &nbsp; &nbsp; &nbsp; 'contentForm:search': 'Search',&nbsp; &nbsp; &nbsp; &nbsp; 'contentForm:j_idt185_select': 'SUPPLIER_NAME',&nbsp; &nbsp; &nbsp; &nbsp; 'javax.faces.ViewState': soup.select_one('[id="javax.faces.ViewState"]')['value']&nbsp; &nbsp; }&nbsp; &nbsp; res = s.post(url,data=payload,headers= headers)&nbsp; &nbsp; sauce = BeautifulSoup(res.text,"lxml")&nbsp; &nbsp; item = sauce.select_one(".formOutputText_HIDDEN-LABEL.outputText_TITLE-BLACK").text&nbsp; &nbsp; print(item)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python