我已经使用请求和 BeautifulSoup 创建了一个 python 脚本来解析配置文件名称以及从网页到其配置文件名称的链接。内容似乎是动态生成的,但它们存在于页面源中。所以,我尝试了以下方法,但不幸的是我什么也没得到。
到目前为止我的尝试:
import requests
from bs4 import BeautifulSoup
URL = 'https://www.century21.com/real-estate-agents/Dallas,TX'
headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8',
'cache-control': 'max-age=0',
'cookie': 'JSESSIONID=8BF2F6FB5603A416DCFBAB8A3BB5A79E.app09-c21-id8; website_user_id=1255553501;',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.86 Safari/537.36'
}
def get_info(link):
res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,"lxml")
for item in soup.select(".media__content"):
profileUrl = item.get("href")
profileName = item.select_one("[itemprop='name']").get_text()
print(profileUrl,profileName)
if __name__ == '__main__':
get_info(URL)
如何从该页面获取内容?
largeQ
心有法竹
慕哥6287543
相关分类