猿问

net/html 解析文档,无论如何都返回 nil *html.Node

我正在尝试处理一个 html 文档。事情是golang.org/x/net/html的Parse回报*html.Node与nil价值,err也是零,这是一种奇怪的,因为如果事情没有被处理Parse正确,我应该得到一个错误!


这是我的代码:


package main


import (

    "bytes"

    "golang.org/x/net/html"

    "io/ioutil"

    "log"

)


func main() {

    html, err := ioutil.ReadFile("html/simple_01.html")

    if e != nil {

        fmt.Fatal(e)

    }

    doc, err := html.Parse(bytes.NewReader(html))

    if err != nil {

        log.Fatal(err)

    }

    // locate <body>

    var body *html.Node

    for s := doc.NextSibling; s != nil; s = s.NextSibling {

        if s.Data == "body" {

            body = s

            break

        }

    }

    log.Println(body)

}

log.Println(body)打印nil。还打印docprints nil,这很奇怪。


这是我正在测试的 HTML 文档


<!DOCTYPE html>

<html>


<head>

    <meta charset='utf-8'>

    <title>Sample page - 01</title>

</head>


<body>

    <p>Aspernatur vel molestiae eius sed sunt doloremque. Ipsa sed voluptate expedita tempore id. Ab nobis delectus magnam.</p>

    <p>Beatae id mollitia nesciunt nesciunt qui explicabo cum. Aspernatur est molestiae laudantium assumenda consequuntur. Odit mollitia non inventore iusto. Id nihil voluptatem vitae. Fugit odio dolores atque sed.</p>

    <p>Qui dolorem ipsum fugit vitae consequuntur suscipit debitis iste. Dignissimos impedit nobis quas facilis. Quia dignissimos perspiciatis quia debitis. Rerum beatae repellat architecto nostrum nulla facere rerum.</p>

    <p>Quas natus ad qui excepturi dolorem. Quas dolorum dolores voluptatem distinctio quisquam culpa et. Ipsam voluptatem suscipit earum reprehenderit. Quos laudantium occaecati quis similique. Numquam rerum sunt rerum et necessitatibus. Laboriosam modi iure praesentium voluptates atque adipisci et.</p>

    <p>Blanditiis dolores nemo quos voluptatem quo quia modi. Quia et alias nesciunt sint voluptatum omnis. Nihil minima ipsa magnam qui amet ea. Blanditiis laborum nihil tempora aliquam.</p>

</body>


</html>

我究竟做错了什么?


慕妹3146593
浏览 114回答 1
1回答

德玛西亚99

您的代码示例中有几个拼写错误,但主要问题是您正在尝试获取根节点的下一个兄弟节点。您首先需要到达 html 标记,然后从那里转到第一个子项,然后循环遍历其兄弟项:package mainimport (&nbsp; &nbsp; "bytes"&nbsp; &nbsp; "golang.org/x/net/html"&nbsp; &nbsp; "io/ioutil"&nbsp; &nbsp; "log")func main() {&nbsp; &nbsp; htmlfile, err := ioutil.ReadFile("html/simple_01.html")&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; log.Fatal(err)&nbsp; &nbsp; }&nbsp; &nbsp; doc, err := html.Parse(bytes.NewReader(htmlfile))&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; log.Fatal(err)&nbsp; &nbsp; }&nbsp; &nbsp; var htmlTag = doc.FirstChild.NextSibling&nbsp; &nbsp; var body *html.Node&nbsp; &nbsp; for s := htmlTag.FirstChild; s != nil; s = s.NextSibling {&nbsp; &nbsp; &nbsp; &nbsp; if s.Data == "body" {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; body = s&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; log.Println(body)}
随时随地看视频慕课网APP

相关分类

Go
我要回答