我正在尝试从此网页获取所有事件和这些事件的其他元数据:https : //alando-palais.de/events
我的问题是,结果(html)不包含我正在寻找的信息。我想,它们“隐藏”在一些 php 脚本后面。这个网址:' https://alando-palais.de/wp/wp-admin/admin-ajax.php '
任何想法,如何等待页面完全加载,或者我必须使用什么样的方法来获取事件信息?
这是我现在的脚本:-):
from bs4 import BeautifulSoup
from urllib.request import urlopen, urljoin
from urllib.parse import urlparse
import re
import requests
if __name__ == '__main__':
target_url = 'https://alando-palais.de/events'
#target_url = 'https://alando-palais.de/wp/wp-admin/admin-ajax.php'
soup = BeautifulSoup(requests.get(target_url).text, 'html.parser')
print(soup)
links = soup.find_all('a', href=True)
for x,link in enumerate(links):
print(x, link['href'])
# for image in images:
# print(urljoin(target_url, image))
预期输出将类似于:
日期:08.03.2019
标题:阁楼俱乐部特别节目:麦外和朋友们
img: https://alando-palais.de/wp/wp-content/uploads/2019/02/0803_MaiwaiFriends-500x281.jpg "
这是这个结果的一些东西:
<div class="vc_gitem-zone vc_gitem-zone-b vc_custom_1547045488900 originalbild vc-gitem-zone-height-mode-auto vc_gitem-is-link" style="background-image: url(https://alando-palais.de/wp/wp-content/uploads/2019/02/0803_MaiwaiFriends-500x281.jpg) !important;">
<a href="https://alando-palais.de/event/penthouse-club-special-maiwai-friends" title="Penthouse Club Special: Maiwai & Friends" class="vc_gitem-link vc-zone-link"></a> <img src="https://alando-palais.de/wp/wp-content/uploads/2019/02/0803_MaiwaiFriends-500x281.jpg" class="vc_gitem-zone-img" alt=""> <div class="vc_gitem-zone-mini">
<div class="vc_gitem_row vc_row vc_gitem-row-position-top"><div class="vc_col-sm-6 vc_gitem-col vc_gitem-col-align-left"> <div class="vc_gitem-post-meta-field-Datum eventdatum vc_gitem-align-left"> 08.03.2019
</div>
慕容3067478
智慧大石
相关分类