猿问

Go XML - 解析 HTML 中的布尔属性会导致 XML 验证错误

我有一个带有以下标签的 html 输出。


<hr noshade>

我的结构是


type Hr struct {

    TagName xml.Name `xml:"hr"`

}

当我尝试使用“encoding/xml”传递 html 时,它会抛出一个错误,指出该属性没有'='字符。


我已经看到抛出此错误是因为默认解码器将 XMLStrict设置为 true。


如何忽略这一点并继续解析文档(使用xml.Unmarshal())?


编辑:包括 XML 和使用的结构。


我发现了解码器设置,并使用了 NewDecoder,但似乎解组没有正确发生。


<html><head><title>Some title</title></head>

<body>

 <h2>Title here</h2>

 <ul>

  <li><a href="../">..</a></li>

  <li><a href="file1.txt">file1.txt</a></li>

  <li><a href="file2.zip">file2.zip</a></li>

  .....

 </ul>

 <hr noshade><em>Powered by <a href="http://subversion.apache.org/">Apache Subversion</a> version 1.7.18 (r1615261).</em>

</body></html>

到目前为止我写的代码


type Anchor struct {

    TagName xml.Name `xml:"a"`

    Href    string   `xml:"href,attr"`

}


type ListEntry struct {

    TagName  xml.Name `xml:"li"`

    Filename Anchor

}


type DirList struct {

    XMLName xml.Name `xml:"ul"`

    Entries []ListEntry

}


type Header struct {

    TagName xml.Name `xml:"h2"`

}


type Head struct {

    TagName xml.Name `xml:"head"`

    title   Title

}


type Title struct {

    TagName xml.Name `xml:"title"`

}


type html struct {

    TagName xml.Name `xml:"html"`

    body    Body     `xml:"body"`

    head    Head

}


type Body struct {

    H2            Header

    DirectoryList DirList

    hr            Hr

    em            Em

}


type Hr struct {

    TagName xml.Name `xml:"hr"`

}


type Em struct {

    TagName xml.Name `xml:"em"`

    link    Anchor

}


   contents := retrieveFromWeb()


    htmlTag := html{}

    decoder := xml.NewDecoder(strings.NewReader(contents))

    decoder.Strict = false

    decoder.AutoClose = xml.HTMLAutoClose

    decoder.Entity = xml.HTMLEntity


    err = decoder.Decode(&htmlTag)


    fmt.Println("DirList: ", htmlTag)

电流输出


DirList:  {{ } {{{ }} {{ } []} {{ }} {{ } {{ } }}} {{ } {{ }}}}


温温酱
浏览 202回答 2
2回答

慕无忌1623718

您可以使用解码器来解组。使用解码器,您可以关闭严格解析并克服您面临的错误。由于您只放置了一行 xml/html 进行解析,因此我假设根元素和 hr 标记之间的某些值和下面的值是示例实现package mainimport (&nbsp; &nbsp; "encoding/xml"&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "strings")type Hr struct {&nbsp; &nbsp; XMLName xml.Name `xml:"a"`&nbsp; &nbsp; TagName string&nbsp; &nbsp;`xml:"hr"`}func main() {&nbsp; &nbsp;&nbsp; &nbsp; s := "<a><hr noshade>value</hr></a>"&nbsp; &nbsp; hr := &Hr{}&nbsp; &nbsp; d := xml.NewDecoder(strings.NewReader(s))&nbsp; &nbsp; d.Strict = false&nbsp; &nbsp; err := d.Decode(hr)&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; panic(err)&nbsp; &nbsp; }&nbsp; &nbsp; fmt.Println(hr.TagName)}fmt.Println(hr.TagName) 将打印“值”

慕斯709654

您的代码中有很多错误:如果属性不是公开的,则不能被另一个包访问(xml在这种情况下):将所有属性设为大写。li 缺少标签名称。看到这个工作代码http://play.golang.org/p/rkNf2OfvdMpackage mainimport (&nbsp; &nbsp; "encoding/xml"&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "log"&nbsp; &nbsp; "strings")type Anchor struct {&nbsp; &nbsp; XMLName xml.Name `xml:"a"`&nbsp; &nbsp; Href&nbsp; &nbsp; string&nbsp; &nbsp;`xml:"href,attr"`}type ListEntry struct {&nbsp; &nbsp; XMLName xml.Name `xml:"li"`&nbsp; &nbsp; &nbsp; &nbsp; Filename Anchor}type DirList struct {&nbsp; &nbsp; XMLName xml.Name&nbsp; &nbsp; `xml:"ul"`&nbsp; &nbsp; Entries []ListEntry `xml:"li"`}type Header struct {&nbsp; &nbsp; XMLName xml.Name `xml:"h2"`}type Head struct {&nbsp; &nbsp; XMLName xml.Name `xml:"head"`&nbsp; &nbsp; Title&nbsp; &nbsp;Title}type Title struct {&nbsp; &nbsp; XMLName xml.Name `xml:"title"`}type Html struct {&nbsp; &nbsp; XMLName xml.Name `xml:"html"`&nbsp; &nbsp; Body&nbsp; &nbsp; Body&nbsp; &nbsp; &nbsp;`xml:"body"`&nbsp; &nbsp; Head&nbsp; &nbsp; Head}type Body struct {&nbsp; &nbsp; H2&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Header&nbsp; &nbsp; DirectoryList DirList&nbsp; &nbsp; Hr&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Hr&nbsp; &nbsp; Em&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Em}type Hr struct {&nbsp; &nbsp; XMLName xml.Name `xml:"hr"`}type Em struct {&nbsp; &nbsp; XMLName xml.Name `xml:"em"`&nbsp; &nbsp; link&nbsp; &nbsp; Anchor}var contents = `<html><head><title>Some title</title></head><body>&nbsp;<h2>Title here</h2>&nbsp;<ul>&nbsp; <li><a href="../">..</a></li>&nbsp; <li><a href="file1.txt">file1.txt</a></li>&nbsp; <li><a href="file2.zip">file2.zip</a></li>&nbsp;</ul>&nbsp;<hr noshade><em>Powered by <a href="http://subversion.apache.org/">Apache Subversion</a> version 1.7.18 (r1615261).</em></body></html>`func main() {&nbsp; &nbsp; htmlTag := Html{}&nbsp; &nbsp; decoder := xml.NewDecoder(strings.NewReader(contents))&nbsp; &nbsp; decoder.Strict = false&nbsp; &nbsp; decoder.AutoClose = xml.HTMLAutoClose&nbsp; &nbsp; decoder.Entity = xml.HTMLEntity&nbsp; &nbsp; err := decoder.Decode(&htmlTag)&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; log.Fatal(err)&nbsp; &nbsp; }&nbsp; &nbsp; fmt.Printf("DirList: %v %#[1]v\n", htmlTag)}
随时随地看视频慕课网APP

相关分类

Go
我要回答