在 Go 中将字符串拆分为 10kb 的块

我在 Go 中有一个大字符串,我想把它分成更小的块。每个块最多应为 10kb。块应该在符文上分开(而不是在符文的中间)。

在 go 中执行此操作的惯用方法是什么?我应该只是在字符串字节的范围内循环吗?我是否缺少一些有用的 stdlib 包?


蓝山帝景
浏览 171回答 3
3回答

天涯尽头无女友

使用RuneStart扫描符文边界。在边界处切割字符串。var chunks []stringfor len(s) > 10000 {    i := 10000    for i >= 10000 - utf8.UTFMax && !utf8.RuneStart(s[i]) {        i--    }    chunks = append(chunks, s[:i])    s = s[i:]}if len(s) > 0 {    chunks = append(chunks, s)}使用该方法,应用程序检查块边界处的几个字节而不是整个字符串。编写代码是为了在字符串不是有效的 UTF-8 编码时保证进度。您可能希望将此情况作为错误处理或以不同方式拆分字符串。

MMMHUHU

看看这个代码:package mainimport (&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "math/rand"&nbsp; &nbsp; "time")func init() {&nbsp; &nbsp; rand.Seed(time.Now().UnixNano())}var alphabet = []rune{&nbsp; &nbsp; 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',&nbsp; &nbsp; 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'æ', 'ø', 'å', 'A', 'B', 'C',&nbsp; &nbsp; 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S',&nbsp; &nbsp; 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'Æ', 'Ø', 'Å',}func randomString(n int) string {&nbsp; &nbsp; b := make([]rune, n, n)&nbsp; &nbsp; for k, _ := range b {&nbsp; &nbsp; &nbsp; &nbsp; b[k] = alphabet[rand.Intn(len(alphabet))]&nbsp; &nbsp; }&nbsp; &nbsp; return string(b)}const (&nbsp; &nbsp; chunkSize int&nbsp; = 100&nbsp; &nbsp; lead4Mask byte = 0xF8 // must equal 0xF0&nbsp; &nbsp; lead3Mask byte = 0xF0 // must equal 0xE0&nbsp; &nbsp; lead2Mask byte = 0xE0 // must equal 0xC0&nbsp; &nbsp; lead1Mask byte = 0x80 // must equal 0x00&nbsp; &nbsp; trailMask byte = 0xC0 // must equal 0x80)func longestPrefix(s string, n int) int {&nbsp; &nbsp; for i := (n - 1); ; i-- {&nbsp; &nbsp; &nbsp; &nbsp; if (s[i] & lead1Mask) == 0x00 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return i + 1&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if (s[i] & trailMask) != 0x80 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return i&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; panic("never reached")}func main() {&nbsp; &nbsp; s := randomString(100000)&nbsp; &nbsp; for len(s) > chunkSize {&nbsp; &nbsp; &nbsp; &nbsp; cut := longestPrefix(s, chunkSize)&nbsp; &nbsp; &nbsp; &nbsp; fmt.Println(s[:cut])&nbsp; &nbsp; &nbsp; &nbsp; s = s[cut:]&nbsp; &nbsp; }&nbsp; &nbsp; fmt.Println(s)}我正在使用丹麦语/挪威语字母表生成 100000 个符文的随机字符串。然后,“魔法”就在longestPrefix. 为了帮助您进行位移位部分,请参考下图:程序打印出各自最长的可能块 <= chunkSize,每行一个。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Go