如何使用 html.parser

大家好,我是 python 的新手,正在尝试使用 python 的 html.parser 模块,我想抓取这个网站并使用 html.parser 获取 url、交易名称和价格,它位于li标签 https://www.mcdelivery 中.com.pk/pk/browse/menu.html 获取 url 后,我想将它们附加到基本 URL 中,并从该站点获取带有价格的交易。


import urllib.request

import urllib.parse

import re

from html.parser import HTMLParser


url = 'https://www.mcdelivery.com.pk/pk/browse/menu.html'

values = {'daypartId': '1', 'catId': '1'}

data = urllib.parse.urlencode(values)

data = data.encode('utf-8')  # data should be bytes

req = urllib.request.Request(url, data)

resp = urllib.request.urlopen(req)

respData = resp.read()

list1 = re.findall(r'<div class="product-cost"(.*?)</div>', str(respData))

for eachp in list1:

    print(eachp)

正在使用正则表达式来上课,但我失败了。现在试图弄清楚如何使用 html.parser 来做到这一点。我知道工作变得更容易,beautifulsoup and scrapy但我正在尝试使用裸 python,所以请跳过第 3 方库。我真的需要帮助。我卡住了。Html.parser 代码(更新)


from html.parser import HTMLParser

import urllib.request

import html.parser

# Import HTML from a URL

url = urllib.request.urlopen(

    "https://www.mcdelivery.com.pk/pk/browse/menu.html")

html = url.read().decode()

url.close()



class MyParser(html.parser.HTMLParser):

    def __init__(self, html):

        self.matches = []

        self.match_count = 0

        super().__init__()


    def handle_data(self, data):

        self.matches.append(data)

        self.match_count += 1


    def handle_starttag(self, tag, attrs):

        attrs = dict(attrs)

        if tag == "div":

            if attrs.get("product-cost"):

                self.handle_data()

            else:

                return


parser = MyParser(html)

parser.feed(html)


for item in parser.matches:

    print(item)


慕码人8056858
浏览 123回答 1
1回答

侃侃尔雅

这是一个可能需要特定调整的良好开端:import html.parserclass MyParser(html.parser.HTMLParser):&nbsp; &nbsp; def __init__(self, html):&nbsp; &nbsp; &nbsp; &nbsp; self.matches = []&nbsp; &nbsp; &nbsp; &nbsp; self.match_count = 0&nbsp; &nbsp; &nbsp; &nbsp; super().__init__()&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;&nbsp; &nbsp; def handle_data(self, data):&nbsp; &nbsp; &nbsp; &nbsp; self.matches.append(data)&nbsp; &nbsp; &nbsp; &nbsp; self.match_count += 1&nbsp; &nbsp; def handle_starttag(self, tag, attrs):&nbsp; &nbsp; &nbsp; &nbsp; attrs = dict(attrs)&nbsp; &nbsp; &nbsp; &nbsp; if tag == "div":&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if attrs.get("product-cost"):&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; self.handle_data()&nbsp; &nbsp; &nbsp; &nbsp; else: return用法是沿着request_html = the_request_method(url, ...)parser = MyParser()parser.feed(request_html)for item in parser.matches:&nbsp; &nbsp; print(item)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python