字符串的词库:因为起始字符太多,需要用不等于逻辑拆分

我有一个 .dat 文件,它是一个包含大约 30 万行的字典/词库


对于每个单词,它下面的字符串开头的括号中的单词是同义词库的备选词,括号中的单词是类型。所以是名词或形容词。例如:


acceptant|1

(adj)|acceptive|receptive 

acceptation|3

(noun)|acceptance

(noun)|word meaning|word sense|sense|signified

(noun)|adoption|acceptance|espousal|blessing|approval|approving

accepted|6

(adj)|recognized|recognised|acknowledged 

(adj)|undisputed|uncontroversial |noncontroversial

(adj)|standard 

(adj)|acceptable|standard |received

(adj)|established |constituted

(adj)|received|conventional 

accepting|1

(adj)|acceptive 

所以在上面有 4 个来自字典的单词,但是每个单词都有多个不同的同义词库条目


我想使用以下方式拆分字符串:


strings.Split(dictionary, !"(")

意思是任何不是“(”字符的东西。这是因为它是一本包含俚语和缩写等等的广泛词典。但我无法弄清楚如何使用不等于运算符


有谁知道如何使用不等于逻辑的拆分?或者任何人都可以提出一些聪明的替代想法吗?


胡说叔叔
浏览 134回答 2
2回答

江户川乱折腾

package mainimport (    "bufio"    "bytes"    "fmt"    "os")func main() {    file, _ := os.Open("dic.dat")    scanner := bufio.NewScanner(file)    for scanner.Scan() {        data := scanner.Bytes()        if bytes.HasPrefix(data, []byte("(")) {            continue        }        line := scanner.Text()        fmt.Println(line)    }}输出:acceptant|1acceptation|3accepted|6accepting|1按照设计,Go 代码应该是高效的。Go 标准库测试包包含一个基准功能。避免不必要的转换和分配很重要。例如,将从文件中读取的字节片转换为字符串、分配和复制。在这种情况下,我们只需要将接受的数据转换为字符串即可。例如,更喜欢字节而不是文本。$ go test dict_test.go -bench=.BenchmarkText-4      500    2486306 ns/op    898528 B/op    14170 allocs/opBenchmarkBytes-4    1000    1489828 ns/op     34080 B/op      609 allocs/op$样本基准数据:KEY: Aback.SYN: Backwards, rearwards, aft, abaft, astern, behind, back.ANT: Onwards, forwards, ahead, before, afront, beyond, afore.=KEY: Abandon.SYN: Leave, forsake, desert, renounce, cease, relinquish,discontinue, castoff, resign, retire, quit, forego, forswear,depart from, vacate, surrender, abjure, repudiate.ANT: Pursue, prosecute, undertake, seek, court, cherish, favor,protect, claim, maintain, defend, advocate, retain, support, uphold,occupy, haunt, hold, assert, vindicate, keep.=dict_test.go:package mainimport (    "bufio"    "bytes"    "fmt"    "io/ioutil"    "os"    "strings"    "testing")func BenchmarkText(b *testing.B) {    b.ReportAllocs()    for N := 0; N < b.N; N++ {        file := bytes.NewReader(benchData)        scanner := bufio.NewScanner(file)        for scanner.Scan() {            line := scanner.Text()            if !strings.HasPrefix(line, "KEY") {                continue            }            _ = line // process line        }        if err := scanner.Err(); err != nil {            b.Fatal(err)        }    }}func BenchmarkBytes(b *testing.B) {    b.ReportAllocs()    for N := 0; N < b.N; N++ {        file := bytes.NewReader(benchData)        scanner := bufio.NewScanner(file)        for scanner.Scan() {            data := scanner.Bytes()            if !bytes.HasPrefix(data, []byte("KEY")) {                continue            }            line := scanner.Text()            _ = line // process line        }        if err := scanner.Err(); err != nil {            b.Fatal(err)        }    }}var benchData = func() []byte {    // A Complete Dictionary of Synonyms and Antonyms by Samuel Fallows    // http://www.gutenberg.org/files/51155/51155-0.txt    data, err := ioutil.ReadFile(`/home/peter/dictionary.51155-0.txt`)    if err != nil {        panic(err)    }    return data}()

MMMHUHU

package mainimport (&nbsp; &nbsp; "bufio"&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "os"&nbsp; &nbsp; "strings")func main() {&nbsp; &nbsp; file, _ := os.Open("dic.dat")&nbsp; &nbsp; scanner := bufio.NewScanner(file)&nbsp; &nbsp; for scanner.Scan() {&nbsp; &nbsp; &nbsp; &nbsp; line := scanner.Text()&nbsp; &nbsp; &nbsp; &nbsp; if strings.HasPrefix(line, "(") {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; fmt.Println(line)&nbsp; &nbsp; }}
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Go