为什么逐行读取文件需要更多内存?

我尝试读取以下格式的大文件:


a string key, 200 values separated by comma

并将其写入地图。


我写了这段代码:


package main


import (

    "bufio"

    "unsafe"

    "fmt"

    "log"

    "os"

    "runtime"

    "strings"

)


func main() {


    file, err := os.Open("file_address.txt")

    if err != nil {

        log.Fatal(err)

    }

    defer file.Close()


    mp := make(map[string]float32)

    var total_size int64 = 0

    scanner := bufio.NewScanner(file)

    var counter int64 = 0


    for scanner.Scan() {

        counter++

        sliced := strings.Split(scanner.Text(), ",")

        mp[sliced[0]] = 2.2

    }


    if err := scanner.Err(); err != nil {

        log.Fatal(err)

    }

    fmt.Printf("loaded: %d. Took %d Mb of memory.", counter, total_size/1024.0/1024.0)

    fmt.Println("Loading finished. Now waiting...")


    var ms runtime.MemStats

    runtime.ReadMemStats(&ms)


    fmt.Printf("\n")

    fmt.Printf("Alloc: %d MB, TotalAlloc: %d MB, Sys: %d MB\n",

        ms.Alloc/1024/1024, ms.TotalAlloc/1024/1024, ms.Sys/1024/1024)

    fmt.Printf("Mallocs: %d, Frees: %d\n",

        ms.Mallocs, ms.Frees)

    fmt.Printf("HeapAlloc: %d MB, HeapSys: %d MB, HeapIdle: %d MB\n",

        ms.HeapAlloc/1024/1024, ms.HeapSys/1024/1024, ms.HeapIdle/1024/1024)

    fmt.Printf("HeapObjects: %d\n", ms.HeapObjects)

    fmt.Printf("\n")

}

这是输出:


loaded: 544594. Took 8 Mb of memory.Loading finished. Now waiting...


Alloc: 2667 MB, TotalAlloc: 3973 MB, Sys: 2831 MB

Mallocs: 1108463, Frees: 401665

HeapAlloc: 2667 MB, HeapSys: 2687 MB, HeapIdle: 11 MB

HeapObjects: 706798


Done!

虽然密钥仅占用约 8Mb,但程序占用约 2.7Gb 内存!似乎sliced永远不会从堆中删除。我尝试sliced=nil在末尾进行设置for,但没有帮助。我读过,如果我将整个文件加载到内存中然后分割它,我可以避免这个问题,但是我必须逐行读取文件,因为我没有足够的内存来加载一些较大的文件文件。


为什么内存被占用了?处理完每一行后如何释放它?


偶然的你
浏览 133回答 2
2回答

拉丁的传说

为了高效地使用 CPU 和内存,key := string(bytes.SplitN(scanner.Bytes(), []byte(","), 2)[0])mp[key] = 2.2

慕仙森

我想我发现了问题!我对大文件的每一行进行切片。返回的[]string是一个切片,包含原始字符串(文件行)的子字符串。现在的问题是,每个子串都不是一个新串。Is 只是一个slice,它保留对未切片字符串(文件行!)的引用。我保留了sliced[0]每一行,因此,我保留了对文件每一行的引用。垃圾收集器不会触及读取行,因为我仍然引用它。从技术上讲,我读取文件的所有行并将其保留在内存中。解决方案是将我想要的部分(sliced[0])复制到一个新字符串,从而有效地丢失对整行的引用。我是这样做的:    sliced := strings.Split(scanner.Text(), ",")    key_rune_arr := []rune(sliced[0])    key := string(key_rune_arr) // now key is a copy of sliced[0] without reference to line    mp[key] = 2.2 //instead of mp[sliced[0]] = 2.2该程序现在变为:package mainimport (    "bufio"    "unsafe"    "fmt"    "log"    "os"    "runtime"    "strings")func main() {    file, err := os.Open("file_address.txt")    if err != nil {        log.Fatal(err)    }    defer file.Close()    mp := make(map[string]float32)    var total_size int64 = 0    scanner := bufio.NewScanner(file)    var counter int64 = 0    for scanner.Scan() {        counter++        sliced := strings.Split(scanner.Text(), ",")        key_rune_arr := []rune(sliced[0])        key := string(key_rune_arr) // now key is a copy of sliced[0] without reference to line        mp[key] = 2.2 //instead of mp[sliced[0]] = 2.2    }    if err := scanner.Err(); err != nil {        log.Fatal(err)    }    fmt.Printf("loaded: %d. Took %d Mb of memory.", counter, total_size/1024.0/1024.0)    fmt.Println("Loading finished. Now waiting...")    var ms runtime.MemStats    runtime.ReadMemStats(&ms)    fmt.Printf("\n")    fmt.Printf("Alloc: %d MB, TotalAlloc: %d MB, Sys: %d MB\n",        ms.Alloc/1024/1024, ms.TotalAlloc/1024/1024, ms.Sys/1024/1024)    fmt.Printf("Mallocs: %d, Frees: %d\n",        ms.Mallocs, ms.Frees)    fmt.Printf("HeapAlloc: %d MB, HeapSys: %d MB, HeapIdle: %d MB\n",        ms.HeapAlloc/1024/1024, ms.HeapSys/1024/1024, ms.HeapIdle/1024/1024)    fmt.Printf("HeapObjects: %d\n", ms.HeapObjects)    fmt.Printf("\n")}结果正如我所希望的那样:loaded: 544594. Took 8 Mb id memory.Loading finished. Now waiting...Alloc: 94 MB, TotalAlloc: 3986 MB, Sys: 135 MBMallocs: 1653590, Frees: 1108129HeapAlloc: 94 MB, HeapSys: 127 MB, HeapIdle: 32 MBHeapObjects: 545461Done!
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Go