如何使用 html.parser

大家好，我是 python 的新手，正在尝试使用 python 的 html.parser 模块，我想抓取这个网站并使用 html.parser 获取 url、交易名称和价格，它位于li标签 https://www.mcdelivery 中.com.pk/pk/browse/menu.html 获取 url 后，我想将它们附加到基本 URL 中，并从该站点获取带有价格的交易。

import urllib.request

import urllib.parse

import re

from html.parser import HTMLParser

url = 'https://www.mcdelivery.com.pk/pk/browse/menu.html'

values = {'daypartId': '1', 'catId': '1'}

data = urllib.parse.urlencode(values)

data = data.encode('utf-8') # data should be bytes

req = urllib.request.Request(url, data)

resp = urllib.request.urlopen(req)

respData = resp.read()

list1 = re.findall(r'<div class="product-cost"(.*?)</div>', str(respData))

for eachp in list1:

print(eachp)

正在使用正则表达式来上课，但我失败了。现在试图弄清楚如何使用 html.parser 来做到这一点。我知道工作变得更容易，beautifulsoup and scrapy但我正在尝试使用裸 python，所以请跳过第 3 方库。我真的需要帮助。我卡住了。Html.parser 代码（更新）

from html.parser import HTMLParser

import urllib.request

import html.parser

# Import HTML from a URL

url = urllib.request.urlopen(

"https://www.mcdelivery.com.pk/pk/browse/menu.html")

html = url.read().decode()

url.close()

class MyParser(html.parser.HTMLParser):

def __init__(self, html):

self.matches = []

self.match_count = 0

super().__init__()

def handle_data(self, data):

self.matches.append(data)

self.match_count += 1

def handle_starttag(self, tag, attrs):

attrs = dict(attrs)

if tag == "div":

if attrs.get("product-cost"):

self.handle_data()

else:

return

parser = MyParser(html)

parser.feed(html)

for item in parser.matches:

print(item)

慕码人8056858

浏览 198回答 1

1回答

侃侃尔雅

这是一个可能需要特定调整的良好开端：import html.parserclass MyParser(html.parser.HTMLParser):    def __init__(self, html):        self.matches = []        self.match_count = 0        super().__init__()            def handle_data(self, data):        self.matches.append(data)        self.match_count += 1    def handle_starttag(self, tag, attrs):        attrs = dict(attrs)        if tag == "div":            if attrs.get("product-cost"):                self.handle_data()        else: return用法是沿着request_html = the_request_method(url, ...)parser = MyParser()parser.feed(request_html)for item in parser.matches:    print(item)

0 0

随时随地看视频慕课网APP