涉及 *html 的递归函数。用于打印 HTML 文档中所有链接的节点

我正在尝试使用接受*html的函数打印HTML文档中的所有链接。节点作为参数。我是Golang和*html的新手。节点数据类型,我以前从未使用过它们。


func visit(links []string, n *html.Node) []string {

    if n == nil {

        return links

    }

    if n.Type == html.ElementNode && n.Data == "a" {

        for _, a := range n.Attr {

            if a.Key == "href" {

                links = append(links, a.Val)

            }

        }

    }

    if i == 0 {

        i++

        return visit(links, n.FirstChild)

    }

    return visit(links, n.NextSibling)

}

检查是否的 if 块的目的是确保只运行一次(第一次)并在后续迭代中运行。但是,从不追加,并且始终返回空切片。我不明白代码中的错误。i==0return visit(links, n.FirstChild)return visit(links, n.NextSibling)links


使用 for 循环时,代码工作正常,但当我尝试使用递归时,代码会中断。


for c := n.FirstChild; c != nil; c = c.NextSibling {

        links = visit(links, c)

    }


桃花长相依
浏览 79回答 1
1回答

POPMUISE

您的代码不起作用,因为它采用文档的第一个子元素,即元素,然后它采用其同级元素,从而导致函数以空的链接片结尾。htmlnil详细解释:下面是一个示例代码,package mainimport (&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "log"&nbsp; &nbsp; "strings"&nbsp; &nbsp; "golang.org/x/net/html")var i int = 0func visit(links []string, n *html.Node) []string {&nbsp; &nbsp; if n == nil {&nbsp; &nbsp; &nbsp; &nbsp; return links&nbsp; &nbsp; }&nbsp; &nbsp; if n.Type == html.ElementNode && n.Data == "a" {&nbsp; &nbsp; &nbsp; &nbsp; for _, a := range n.Attr {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if a.Key == "href" {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; links = append(links, a.Val)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; if i == 0 {&nbsp; &nbsp; &nbsp; &nbsp; i++&nbsp; &nbsp; &nbsp; &nbsp; return visit(links, n.FirstChild)&nbsp; &nbsp; }&nbsp; &nbsp; return visit(links, n.NextSibling)}func main() {&nbsp; &nbsp; s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a href="/bar/baz">BarBaz</a></ul>`&nbsp; &nbsp; doc, err := html.Parse(strings.NewReader(s))&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; log.Fatal(err)&nbsp; &nbsp; }&nbsp; &nbsp; links := visit([]string{}, doc)&nbsp; &nbsp; fmt.Println(links)}第一次调用访问,参数:链接 = []n = DocumentNode在第一次调用中,i=0,因此它使用文档节点的第一个子节点进行递归调用。visit第二次访问调用,参数:链接 = []n = ElementNode (n.Data = “html”)在第二次调用中,是元素节点。现在,对 元素节点的下一个同级节点进行第三次调用。这就是问题所在。元素节点没有同级,因此将是 。nhtmlvisithtmlhtmlnnil第三次调用访问,参数:链接 = []n = nil因此,现在所有以递归方式调用的函数 3 函数调用都将返回,并且执行流将返回到,因此切片将保持为空。mainlinks希望您理解。编写此功能的正确方法是通过您在问题中共享的循环,如下所示,package mainimport (&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "log"&nbsp; &nbsp; "strings"&nbsp; &nbsp; "golang.org/x/net/html")func visit(links []string, n *html.Node) []string {&nbsp; &nbsp; if n.Type == html.ElementNode && n.Data == "a" {&nbsp; &nbsp; &nbsp; &nbsp; for _, a := range n.Attr {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if a.Key == "href" {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; links = append(links, a.Val)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; for c := n.FirstChild; c != nil; c = c.NextSibling {&nbsp; &nbsp; &nbsp; &nbsp; links = visit(links, c)&nbsp; &nbsp; }&nbsp; &nbsp; return links}func main() {&nbsp; &nbsp; s := `<p>Links:</p><ul><li><a href="foo">Foo</a><li><a href="/bar/baz">BarBaz</a></ul>`&nbsp; &nbsp; doc, err := html.Parse(strings.NewReader(s))&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; log.Fatal(err)&nbsp; &nbsp; }&nbsp; &nbsp; links := visit([]string{}, doc)&nbsp; &nbsp; fmt.Println(links)}在这里,循环通过检查每个HTML元素的子元素来帮助递归地查找链接。如果其中一个HTML元素没有同级元素,那么它将简单地移动到其父级的下一个同级元素并检查
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Go