在 Python 中从 URL 读取 XML 文件

首页课程实战体系课手记专栏慕课教程

在 Python 中从 URL 读取 XML 文件

我想读取 count 中存在的整数tags。

这是我写的代码：

import xml.etree.ElementTree as ET

import urllib.request, urllib.parse, urllib.error

from bs4 import BeautifulSoup

import ssl

ctx = ssl.create_default_context()

ctx.check_hostname = False

ctx.verify_mode = ssl.CERT_NONE

url = 'http://py4e-data.dr-chuck.net/comments_42.xml'

content1 = urllib.request.urlopen(url, context = ctx).read()

soup = BeautifulSoup(content1, 'html.parser')

tree = ET.fromstring(soup)

tags = tree.findall('count')

print(tags)

它抛出一个错误：

Traceback (most recent call last):

File "C:\Users\Name\Desktop\Py4e\Me\Assi_15_01.py", line 15, in <module>

tree = ET.fromstring(soup)

File "C:\Users\Name\AppData\Local\Programs\Python\Python38-32\lib\xml\etree\ElementTree.py", line 1320, in XML

parser.feed(text)

TypeError: a bytes-like object is required, not 'BeautifulSoup'

我能做些什么？

更多信息：http://py4e-data.dr-chuck.net/comments_42.xml

米琪卡哇伊

浏览 212回答 2

2回答

SMILET

无需使用xml.etree，只需使用<count>BeautifulSoup 选择所有标签即可：import requestsfrom bs4 import BeautifulSoupurl =  'http://py4e-data.dr-chuck.net/comments_42.xml'soup = BeautifulSoup(requests.get(url).content, 'html.parser')for c in soup.select('count'):    print(int(c.text))印刷：979790908887878079797876767272666665656461615958575754514947403837363632252422211918181412129732

0 0

白衣非少年

我认为您不需要使用 ElementTreee。只需将 BeautiflulSoup 更改为使用 lxml 解析器（将“html-parser”更改为“lxml”）并在汤上调用 findall 方法，而不是树（即 soup.findall('count')）。

0 0

随时随地看视频慕课网APP