Beautifulsoup 无法提取所有 html

我尝试创建一个程序来提取 Spotify 中 Daily Mix 1 中的所有歌曲。我知道我必须使用的逻辑,但我无法获得整个源代码。


这是我写的代码:


import requests 

from bs4 import BeautifulSoup


headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}

result = requests.get("https://open.spotify.com/playlist/37i9dQZF1E38L6D2gtQHWw", headers=headers)


src = result.content

soup = BeautifulSoup(src, 'lxml')


print(soup.prettify())

这是我得到的输出:

我使用的标题适用于亚马逊和维基百科等其他网站,所以我认为这不是问题。我也不认为问题与 javascript 有关,因为在其他用于抓取网站(例如亚马逊(也包含很多<script>标签))的程序中,代码显示得非常好。

请告诉问题是什么。

PS - 请不要在您的解决方案中推荐 selenium 或 scrapy。


皈依舞
浏览 56回答 1
1回答

隔江千里

您尝试抓取的日期是由 Javascript 填充的,因此您不会在页面的源代码中找到它,但您可以通过网站正在使用的 api 获取它:import&nbsp; json , requestsfrom bs4 import BeautifulSoup as bsbase_url = 'https://open.spotify.com/playlist/37i9dQZF1E38L6D2gtQHWw'headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36"}# Getting the access token first to send it with the header to the api endpointpage&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = requests.get(base_url,headers=headers)soup&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = bs(page.text,'html.parser')access_token_tag&nbsp; = soup.find('script',{'id':'config'})json_obj&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; = json.loads(access_token_tag.text)access_token_text = json_obj['accessToken']endpoint = "https://api.spotify.com/v1/playlists/37i9dQZF1E38L6D2gtQHWw"headers.update({"authorization": f"Bearer {access_token_text}",&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'referer': base_url,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'accept': 'application/json',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'app-platform': 'WebPlayer'})url_paramters = {'type': 'track,episode','market': 'EG'}data = requests.get(endpoint, params=url_paramters, headers=headers).json()tracks = data['tracks']['items']for index , track in enumerate(tracks,1):&nbsp; &nbsp; &nbsp; &nbsp; print(f'{index } - ' , track['track']['name'] )输出:1 -&nbsp; Tu Hi Haqeeqat2 -&nbsp; Hasi - Female Version3 -&nbsp; Kabhi Jo Baadal Barse4 -&nbsp; Tere Bin Nahi Laage (Male Version)5 -&nbsp; Dekhte Dekhte (Rahat Fateh Ali Khan Version) [From "Batti Gul MeterChalu"]6 -&nbsp; Panchhi Bole7 -&nbsp; Jame Raho8 -&nbsp; Banjaara (From "Ek Villain")9 -&nbsp; Mitwa10 -&nbsp; Agar Tu Hota (From "Baaghi")11 -&nbsp; Aasan Nahin Yahan12 -&nbsp; Jiyo Re Bahubali13 -&nbsp; Pyaar Manga Hai14 -&nbsp; Kaun Hain Voh15 -&nbsp; Mamta Se Bhari16 -&nbsp; Zehnaseeb17 -&nbsp; Dil Ibaadat18 -&nbsp; Tu Hi Tu (Reprise)19 -&nbsp; Haule Haule20 -&nbsp; Manohari21 -&nbsp; Ilahi (From "Yeh Jawaani Hai Deewani")22 -&nbsp; Humsafar (From "Badrinath Ki Dulhania")23 -&nbsp; Kiya Kiya24 -&nbsp; Sunn Raha Hai (Female)25 -&nbsp; Phir Le Aya Dil26 -&nbsp; Tere Naal Nachna (From "Nawabzaade")27 -&nbsp; Galliyan (From "Ek Villain")28 -&nbsp; Valentine's Mashup 2019(Remix By Kedrock,Sd Style)29 -&nbsp; Halka Halka30 -&nbsp; Raabta (From "Agent Vinod")31 -&nbsp; Mere Bina - Unplugged32 -&nbsp; Agar Tum Saath Ho-Maahi Ve33 -&nbsp; Swapn Sunehere34 -&nbsp; Radha35 -&nbsp; Behti Hawa Sa Tha Woh36 -&nbsp; Mere Rashke Qamar37 -&nbsp; Kehta Hai Pal Pal38 -&nbsp; Maana Ke Hum Yaar Nahin39 -&nbsp; Khoya Hain40 -&nbsp; O Re Piya41 -&nbsp; Jal Rahin Hain42 -&nbsp; Zero Hour Mashup 2015(Remix By Dj Kiran Kamath)43 -&nbsp; Aashiq Banaya Aapne44 -&nbsp; Bikhri Bikhri45 -&nbsp; Maula Mere Lele Meri Jaan46 -&nbsp; Yadaan Teriyaan (Version 2)47 -&nbsp; Tujh Mein Rab Dikhta Hai48 -&nbsp; Veeron Ke Veer Aa49 -&nbsp; Bolo Har Har Har (feat. Mohit Chauhan, Sukhwinder Singh, Badshah, Megha Sriram Dalton, Anugrah, Sandeep Shrivastava)50 -&nbsp; Main Agar
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Html5