我有以下内容,我试图仅捕获文本匹配的第二种情况But I want this one here。目前,它涵盖了这两种情况。
package main
import (
"bytes"
"fmt"
"io"
"strings"
"golang.org/x/net/html"
)
func getTag(doc *html.Node, tag string) []*html.Node {
var nodes []*html.Node
var crawler func(*html.Node)
crawler = func(node *html.Node) {
if node.Type == html.ElementNode && node.Data == tag {
nodes = append(nodes, node)
return
}
for child := node.FirstChild; child != nil; child = child.NextSibling {
crawler(child)
}
}
crawler(doc)
return nodes
}
func main() {
doc, _ := html.Parse(strings.NewReader(testHTML))
nodes := getTag(doc, "a")
var buf bytes.Buffer
w := io.Writer(&buf)
for i, node := range nodes {
html.Render(w, node)
if i < (len(nodes) - 1) {
w.Write([]byte("\n"))
}
}
fmt.Println(buf.String())
}
var testHTML = `<html><body>
I do not want this link here <a href="blah">link text</a>
But I want this one here <a href="blah blah">more link text</a>
</body></html>`
这输出:
<a href="blah">link text</a>
<a href="blah blah">more link text</a>
我想匹配<a>标签之前的特定文本,如果匹配,则返回<a>节点。例如,传入But I want this one here并返回<a href="blah blah">more link text</a>. 有人告诉我不要用正则表达式解析 html,但现在我被卡住了。
子衿沉夜
相关分类