猿问

电子邮件主题、不同字符集的标头解码,如 ISO-2022-JP、GB-2312 等

我正在开发一个需要处理不同字符集的电子邮件编码/解码的项目。一个python代码可以如下所示:


from email.header import Header, decode_header, make_header

from charset import text_to_utf8    


class ....

def decode_header(self, header):

    decoded_header = decode_header(header)


    if decoded_header[0][1] is None:

        return text_to_utf8(decoded_header[0][0]).decode("utf-8", "replace")

    else:

        return decoded_header[0][0].decode(decoded_header[0][1].replace("windows-", "cp"), "replace")

基本上,对于像 "=?iso-2022-jp?b?GyRCRW1CQE86GyhCIDxtb21vQHRhcm8ubmUuanA=?="; “decode_header”函数只是试图找到编码:'iso-2022-jp';然后它将使用“解码”函数将字符集解码为 unicode。


现在,在 Go 中,我可以做类似的事情:


import "mime"


dec := new(mime.WordDecoder)

text := "=?utf-8?q?=C3=89ric?= <eric@example.org>, =?utf-8?q?Ana=C3=AFs?= <anais@example.org>"

header, err := dec.DecodeHeader(text)


Seems that there mime.WordDecoder allow to put a charset decoder "hook": 

type WordDecoder struct {

   // CharsetReader, if non-nil, defines a function to generate

   // charset-conversion readers, converting from the provided

   // charset into UTF-8.

   // Charsets are always lower-case. utf-8, iso-8859-1 and us-ascii charsets

   // are handled by default.

   // One of the the CharsetReader's result values must be non-nil.

   CharsetReader func(charset string, input io.Reader) (io.Reader, error)

}           

我想知道是否有任何库可以让我转换任意字符集,如 python 中的“解码”函数,如上例所示。我不想写一个像 mime/encodedword.go 中使用的那样的大“开关案例”:


func (d *WordDecoder) convert(buf *bytes.Buffer, charset string, content []byte) error {

   switch {

   case strings.EqualFold("utf-8", charset):

      buf.Write(content)

   case strings.EqualFold("iso-8859-1", charset):

      for _, c := range content {

         buf.WriteRune(rune(c))

      }

....

任何帮助将不胜感激。


皈依舞
浏览 253回答 2
2回答

猛跑小猪

似乎 golang.org/x/net/html/charset 包已经提供了一个带有可用编码的地图。以下代码对我有用:import "golang.org/x/net/html/charset"CharsetReader := func (label string, input io.Reader) (io.Reader, error) {&nbsp; &nbsp; label = strings.Replace(label, "windows-", "cp", -1)&nbsp; &nbsp; encoding, _ := charset.Lookup(label)&nbsp; &nbsp; return encoding.NewDecoder().Reader(input), nil}dec := mime.WordDecoder{CharsetReader: CharsetReader}text := "=?iso-2022-jp?b?GyRCRW1CQE86GyhCIDxtb21vQHRhcm8ubmUuanA=?="header, err := dec.DecodeHeader(text)谢谢你的帮助!

潇湘沐

我不确定这是否是您要查找的内容,但是golang.org/x/text我正在使用该软件包将 Windows-1251 转换为 UTF-8。代码看起来像import (&nbsp; &nbsp; "golang.org/x/text/encoding/charmap"&nbsp; &nbsp; "golang.org/x/text/transform"&nbsp; &nbsp; "io/ioutil"&nbsp; &nbsp; "strings")func convert(s string) string {&nbsp; &nbsp; sr := strings.NewReader(s)&nbsp; &nbsp; tr := transform.NewReader(sr, charmap.Windows1251.NewDecoder())&nbsp; &nbsp; buf, err := ioutil.ReadAll(tr)&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; return ""&nbsp; &nbsp; }&nbsp; &nbsp; return string(buf)}我认为在你的情况下,如果你想避免“一个大的'switch-case'”,你可以创建一种带有可用编码完整列表的地图,然后制作类似的东西:var encodings = map[string]transform.Transformer{&nbsp; &nbsp; "win-1251": charmap.Windows1251.NewDecoder(),}func convert(s, charset string) string {&nbsp; &nbsp; buf, err := ioutil.ReadAll(transform.NewReader(strings.NewReader(s), encodings[charset]))&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; return ""&nbsp; &nbsp; }&nbsp; &nbsp; return string(buf)}
随时随地看视频慕课网APP

相关分类

Go
我要回答