GO中日志文件的文本解析

新手来这里吧!


我正在尝试编写一个 Go 程序,该程序将解析日志文件并返回匹配行的特定信息。


为了举例说明我想要实现的目标,我将从一个如下所示的日志文件开始:


2019-09-30T04:17:02 - REQUEST-A

2019-09-30T04:18:02 - REQUEST-C

2019-09-30T04:19:02 - REQUEST-B

2019-09-30T04:20:02 - REQUEST-A

2019-09-30T04:21:02 - REQUEST-A

2019-09-30T04:22:02 - REQUEST-B

从这里我想提取所有“REQUEST-A”并将请求发生的时间打印到终端或文件。


我尝试过使用 os.Open 和 Scanner,并且可以使用 Scanner.Text 来记录它已发现我的字符串的出现,如下所示:


package main


import (

    "bufio"

    "fmt"

    "os"

    "strings"

)


func main() {

    request := 0

    f, err := os.Open("request.log")

    if err != nil {

        fmt.Print("There has been an error!: ", err)

    }

    defer f.Close()

    scanner := bufio.NewScanner(f)


    for scanner.Scan() {

        if strings.Contains(scanner.Text(), "REQUEST-A") {

            request = request + 1

        }


        if err := scanner.Err(); err != nil {

        }

        fmt.Println(request)

    }

}

但我不确定如何用它来检索我想要的信息。通常我会使用 Bash,但我想我应该扩展一下,看看是否可以使用 Go。任何建议将不胜感激。


慕标琳琳
浏览 92回答 3
3回答

翻过高山走不出你

在 Go 中,我们努力提高效率。不要做不必要的事情。例如,package mainimport (&nbsp; &nbsp; "bufio"&nbsp; &nbsp; "bytes"&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "os")func main() {&nbsp; &nbsp; lines, requestA := 0, 0&nbsp; &nbsp; f, err := os.Open("request.log")&nbsp; &nbsp; if err != nil {&nbsp; &nbsp; &nbsp; &nbsp; fmt.Print("There has been an error!: ", err)&nbsp; &nbsp; }&nbsp; &nbsp; defer f.Close()&nbsp; &nbsp; scanner := bufio.NewScanner(f)&nbsp; &nbsp; for scanner.Scan() {&nbsp; &nbsp; &nbsp; &nbsp; lines++&nbsp; &nbsp; &nbsp; &nbsp; // filter request a&nbsp; &nbsp; &nbsp; &nbsp; line := scanner.Bytes()&nbsp; &nbsp; &nbsp; &nbsp; if len(line) <= 30 || line[30] != 'A' {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if !bytes.Equal(line[22:], []byte("REQUEST-A")) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; requestA++&nbsp; &nbsp; &nbsp; &nbsp; request := string(line)&nbsp; &nbsp; &nbsp; &nbsp; // handle request a&nbsp; &nbsp; &nbsp; &nbsp; fmt.Println(request)&nbsp; &nbsp; }&nbsp; &nbsp; if err := scanner.Err(); err != nil {&nbsp; &nbsp; &nbsp; &nbsp; fmt.Println(err)&nbsp; &nbsp; }&nbsp; &nbsp; fmt.Println(lines, requestA)}输出:$ go run request.go2019-09-30T04:17:02 - REQUEST-A2019-09-30T04:20:02 - REQUEST-A2019-09-30T04:21:02 - REQUEST-A6 3$ cat request.log2019-09-30T04:17:02 - REQUEST-A2019-09-30T04:18:02 - REQUEST-C2019-09-30T04:19:02 - REQUEST-B2019-09-30T04:20:02 - REQUEST-A2019-09-30T04:21:02 - REQUEST-A2019-09-30T04:22:02 - REQUEST-B为了强调效率的重要性(日志可能非常大),让我们针对Markus W Mahlberg的解决方案运行一个基准测试:https://play.golang.org/p/R2D_BeiJvx9。$ go test log_test.go -bench=. -benchmemBenchmarkPeterSO-4&nbsp; &nbsp;21285&nbsp; &nbsp; &nbsp;56953 ns/op&nbsp; &nbsp; 4128 B/op&nbsp; &nbsp; &nbsp; 2 allocs/opBenchmarkMarkusM-4&nbsp; &nbsp; &nbsp;649&nbsp; &nbsp;1817868 ns/op&nbsp; &nbsp;84747 B/op&nbsp; &nbsp;2390 allocs/oplog_test.go:package mainimport (&nbsp; &nbsp; "bufio"&nbsp; &nbsp; "bytes"&nbsp; &nbsp; "regexp"&nbsp; &nbsp; "strings"&nbsp; &nbsp; "testing")var requestLog = `2019-09-30T04:17:02 - REQUEST-A2019-09-30T04:18:02 - REQUEST-C2019-09-30T04:19:02 - REQUEST-B2019-09-30T04:20:02 - REQUEST-A2019-09-30T04:21:02 - REQUEST-A2019-09-30T04:22:02 - REQUEST-B`var benchLog = strings.Repeat(requestLog[1:], 256)func BenchmarkPeterSO(b *testing.B) {&nbsp; &nbsp; for N := 0; N < b.N; N++ {&nbsp; &nbsp; &nbsp; &nbsp; scanner := bufio.NewScanner(strings.NewReader(benchLog))&nbsp; &nbsp; &nbsp; &nbsp; for scanner.Scan() {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // filter request a&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; line := scanner.Bytes()&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if len(line) <= 30 || line[30] != 'A' {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if !bytes.Equal(line[22:], []byte("REQUEST-A")) {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; request := string(line)&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; // handle request a&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _ = request&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if err := scanner.Err(); err != nil {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b.Fatal(err)&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}func BenchmarkMarkusM(b *testing.B) {&nbsp; &nbsp; for N := 0; N < b.N; N++ {&nbsp; &nbsp; &nbsp; &nbsp; var re *regexp.Regexp = regexp.MustCompile(`^(\S*) - REQUEST-A$`)&nbsp; &nbsp; &nbsp; &nbsp; scanner := bufio.NewScanner(strings.NewReader(benchLog))&nbsp; &nbsp; &nbsp; &nbsp; var res []string&nbsp; &nbsp; &nbsp; &nbsp; for scanner.Scan() {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if res = re.FindStringSubmatch(scanner.Text()); len(res) > 0 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; _ = res[1]&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if err := scanner.Err(); err != nil {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; b.Fatal(err)&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}

慕码人2483693

使用以下代码打印值为“REQUEST-A”的日志条目的时间字段。for scanner.Scan() {&nbsp; &nbsp; line := scanner.Text()&nbsp; &nbsp; if len(line) < 19 {&nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; }&nbsp; &nbsp; if line[19:] == " - REQUEST-A" {&nbsp; &nbsp; &nbsp; &nbsp; fmt.Println(line[:19])&nbsp; &nbsp; }}在围棋游乐场上运行它!要写入文件,请将 stdout 重定向到文件。上面的代码假设时间戳之后的所有内容都是“-REQUEST-A”。如果“-REQUEST-A”是其他数据的前缀,请使用以下内容:const lenTimestamp = 19for scanner.Scan() {&nbsp; &nbsp; line := scanner.Text()&nbsp; &nbsp; if len(line) < lenTimestamp {&nbsp; &nbsp; &nbsp; &nbsp; continue&nbsp; &nbsp; }&nbsp; &nbsp; if strings.HasPrefix(line[lenTimestamp:], " - REQUEST-A") {&nbsp; &nbsp; &nbsp; &nbsp; fmt.Println(line[:lenTimestamp])&nbsp; &nbsp; }}在操场上运行这个版本。

函数式编程

如果您使用的是 Linux 或 Mac,则不需要 Go 程序:$ echo "2019-09-30T04:17:02 - REQUEST-A2019-09-30T04:18:02 - REQUEST-C2019-09-30T04:19:02 - REQUEST-B2019-09-30T04:20:02 - REQUEST-A2019-09-30T04:21:02 - REQUEST-A2019-09-30T04:22:02 - REQUEST-B" | awk '/REQUEST-A/{print $1}' | tee request.log2019-09-30T04:17:022019-09-30T04:20:022019-09-30T04:21:02然而,如果你真的想在 Go 中实现这个:package mainimport (&nbsp; &nbsp; "bufio"&nbsp; &nbsp; "fmt"&nbsp; &nbsp; "regexp"&nbsp; &nbsp; "strings")const input = `2019-09-30T04:17:02 - REQUEST-A2019-09-30T04:18:02 - REQUEST-C2019-09-30T04:19:02 - REQUEST-B2019-09-30T04:20:02 - REQUEST-A2019-09-30T04:21:02 - REQUEST-A2019-09-30T04:22:02 - REQUEST-B`var scanner = bufio.NewScanner(strings.NewReader(input))// Here comes the magic: We create an anonymous group containing all// non-whitespace characters up to the first blank. Since it is// a group, we can easily extract it later down the road.var re *regexp.Regexp = regexp.MustCompile(`^(\S*) - REQUEST-A$`)func main() {&nbsp; &nbsp; var res []string&nbsp; &nbsp; for scanner.Scan() {&nbsp; &nbsp; &nbsp; &nbsp; // We use re.FindStringSubmatch here, as it actually kills two&nbsp; &nbsp; &nbsp; &nbsp; // birds with one stone: We check wether it is REQUEST-A&nbsp; &nbsp; &nbsp; &nbsp; // and the anonymous group of the regexp contains what we are looking for.&nbsp; &nbsp; &nbsp; &nbsp; if res = re.FindStringSubmatch(scanner.Text()); len(res) > 0 {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fmt.Println(res[1])&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }}Run on playground
打开App,查看更多内容
随时随地看视频慕课网APP