在 Golang 中解码 XML 时自定义字符串翻译

我正在解码一些仅包含字符串值和属性的 XML。它还"&"包含"&"一些"&". 我还将对这些字符串值做更多的工作,我需要字符"|"永远不会出现,所以我想"|"用"%7C".


我可以strings.Replace在解码后使用这些更改,但由于解码已经在做类似的工作(毕竟它确实转换"&"为"&")我想同时做。


我要解析的文件很大,所以我会做一些类似于http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/


这是一个简短的示例 xml 文件:


<?xml version="1.0" encoding="utf-8"?>

<tests>

    <test_content>X&amp;amp;Y is a dumb way to write XnY | also here's a pipe.</test_content>

    <test_attr>

      <test name="Normal" value="still normal" />

      <test name="X&amp;amp;Y" value="should be the same as X&amp;Y | XnY would have been easier." />

    </test_attr>

</tests>

还有一些执行标准解码并打印出结果的 Go 代码:


package main


import (

    "encoding/xml"

    "fmt"

    "os"

)


type XMLTests struct {

    Content string     `xml:"test_content"`

    Tests   []*XMLTest `xml:"test_attr>test"`

}


type XMLTest struct {

    Name  string `xml:"name,attr"`

    Value string `xml:"value,attr"`

}


func main() {

    xmlFile, err := os.Open("test.xml")

    if err != nil {

        fmt.Println("Error opening file:", err)

        return

    }

    defer xmlFile.Close()


    var q XMLTests


    decoder := xml.NewDecoder(xmlFile)


    // I tried this to no avail:

    // decoder.Entity = make(map[string]string)

    // decoder.Entity["|"] = "%7C"

    // decoder.Entity["&amp;amp;"] = "&"


    var inElement string

    for {

        t, _ := decoder.Token()

        if t == nil {

            break

        }

        switch se := t.(type) {

        case xml.StartElement:

            inElement = se.Name.Local

            if inElement == "tests" {

                decoder.DecodeElement(&q, &se)

            }

        default:

        }

    }


    fmt.Println(q.Content)

    for _, t := range q.Tests {

        fmt.Printf("\t%s\t\t%s\n", t.Name, t.Value)

    }

}

如何修改此代码以获得我想要的?即:如何定制解码器?


我查看了文档,特别是https://golang.org/pkg/encoding/xml/#Decoder并尝试使用实体地图,但我无法取得任何进展。


至尊宝的传说
浏览 187回答 1
1回答

忽然笑

要处理属性,可以使用UnmarshalerAttr接口与UnmarshalXMLAttr方法。你的例子就变成了:package mainimport (&nbsp; &nbsp; "encoding/xml"&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "strings")type string2 stringtype XMLTests struct {&nbsp; &nbsp; Content string2&nbsp; &nbsp; `xml:"test_content"`&nbsp; &nbsp; Tests&nbsp; &nbsp;[]*XMLTest `xml:"test_attr>test"`}type XMLTest struct {&nbsp; &nbsp; Name&nbsp; string2 `xml:"name,attr"`&nbsp; &nbsp; Value string2 `xml:"value,attr"`}func decode(s string) string2 {&nbsp; &nbsp; s = strings.Replace(s, "|", "%7C", -1)&nbsp; &nbsp; s = strings.Replace(s, "&amp;", "&", -1)&nbsp; &nbsp; return string2(s)}func (s *string2) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {&nbsp; &nbsp; var content string&nbsp; &nbsp; if err := d.DecodeElement(&content, &start); err != nil {&nbsp; &nbsp; &nbsp; &nbsp; return err&nbsp; &nbsp; }&nbsp; &nbsp; *s = decode(content)&nbsp; &nbsp; return nil}func (s *string2) UnmarshalXMLAttr(attr xml.Attr) error {&nbsp; &nbsp; *s = decode(attr.Value)&nbsp; &nbsp; return nil}func main() {&nbsp; &nbsp; xmlData := `<?xml version="1.0" encoding="utf-8"?><tests>&nbsp; &nbsp; <test_content>X&amp;amp;Y is a dumb way to write XnY | also here's a pipe.</test_content>&nbsp; &nbsp; <test_attr>&nbsp; &nbsp; &nbsp; <test name="Normal" value="still normal" />&nbsp; &nbsp; &nbsp; <test name="X&amp;amp;Y" value="should be the same as X&amp;Y | XnY would have been easier." />&nbsp; &nbsp; </test_attr></tests>`&nbsp; &nbsp; xmlFile := strings.NewReader(xmlData)&nbsp; &nbsp; var q XMLTests&nbsp; &nbsp; decoder := xml.NewDecoder(xmlFile)&nbsp; &nbsp; decoder.Decode(&q)&nbsp; &nbsp; fmt.Println(q.Content)&nbsp; &nbsp; for _, t := range q.Tests {&nbsp; &nbsp; &nbsp; &nbsp; fmt.Printf("\t%s\t\t%s\n", t.Name, t.Value)&nbsp; &nbsp; }}输出:X&Y is a dumb way to write XnY %7C also here's a pipe.&nbsp; &nbsp; Normal&nbsp; &nbsp; &nbsp; still normal&nbsp; &nbsp; X&Y&nbsp; &nbsp; &nbsp;should be the same as X&Y %7C XnY would have been easier.(您可以在Go 操场上进行测试。)因此,如果string2在任何地方使用都适合您,那么这应该可以解决问题。(编辑:更简单的代码,不使用DecodeElement和类型开关......)
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Go