尝试使用 BeautifulSoup 获取元数据时出现意外结果

尝试使用 BeautifulSoup 获取元数据时出现意外结果

好的，这就是我正在尝试做的。我对 Python 还很陌生，我才刚刚掌握它。无论如何，使用这个小工具，我正在尝试从页面中提取数据。在这种情况下，我希望用户输入一个 URL 并让它返回

<meta content=" % Likes, % Comments - @% on Instagram: “post description []”" name="description" />

但是，替换%为帖子的喜欢/评论等数量。

这是我的完整代码：

from urllib.request import urlopen

from bs4 import BeautifulSoup

import requests

import re

url = "https://www.instagram.com/p/BsOGulcndj-/"

page2 = requests.get(url)

soup2 = BeautifulSoup(page2.content, 'html.parser')

result = soup2.findAll('content', attrs={'content': 'description'})

print (result)

但是每当我运行它时，我都会得到[]. 我究竟做错了什么？

慕无忌1623718

浏览 214回答 2

2回答

ITMISS

匹配这些标签的正确方法是：result = soup2.findAll('meta', content=True, attrs={"name": "description"})但是，html.parser不能<meta>正确解析标签。它没有意识到它们是自闭合的，所以它<head>在结果中包含了其余的大部分。我改为soup2 = BeautifulSoup(page2.content, 'html5lib')然后上面搜索的结果是：[<meta content="46.3m Likes, 2.6m Comments - EGG GANG 🌍 (@world_record_egg) on Instagram: “Let’s set a world record together and get the most liked post on Instagram. Beating the current…”" name="description"/>]

0

0

随时随地看视频慕课网APP

相关分类

Python