猿问

PHP从一个数组创建多个给定大小的json文件

我想从一个数组创建多个 json 文件(file1.json、file2.json 等),并且每个文件的最大文件大小必须为 5 mb。


我有这样的数组:


    array (

    0 => array (

        'category' => '179535',

        'email' => NULL,

        'level' => 1,

        'name' => 'FOO'

    ),

    1 => array (

        'category' => '1795',

        'email' => NULL,

        'level' => 1,

        'name' => 'BARFOO'

    ),

    2 => array (

        'category' => '16985',

        'email' => NULL,

        'level' => 1,

        'name' => 'FOOBAR'

    ),

    ....

    

    25500 => array (

        'category' => '10055',

        'email' => NULL,

        'level' => 1,

        'name' => 'FOOBARBAR'

    )    

)

如果我用 json_encode($arr) 将其写入文件中。生成的文件大约为 85mb。那么如何拆分该数组以使每个文件最多 5 MB?


qq_笑_17
浏览 116回答 4
4回答

婷婷同学_

假设您的数据相当对称,最性能友好的选项就是简单地array_chunk()将数组切割成块,当json_encoded 时,这些块将大约是预期的大小。让我们看一下数组中的样本:string(58) "{"category":"1795","email":null,"level":1,"name":"BARFOO"}"这里的“名称”似乎是唯一可能变化更大的一个。我们将其平均为 12 个字符,每个项目的字符串长度为 64 字节。然后,您可以将其中的 78125 个放入 5MB 中。为了将其保持在标记之下,我们将其设置为 75000。然后,$chunks = array_chunk($data, 75000)将为您提供 X 个大约或略低于 5MB 标记的块。现在,如果您想要更精确,并且尺寸确实很重要......我们可以:$size = 0; // size counter$chunkno = 1; // chunk number$maxbytes = 50000; // 50000-byte chunks$chunks = []; // for array chunksforeach($data as $set) {    // if over the limit, move on to next chunk    if ($size > $maxbytes) {         $size = 0;        $chunkno++;    }    $size += strlen(json_encode($set)) + 1; // add a comma's length!    $chunks[$chunkno][] = $set;}// unset($data); // in case you have memory concerns显然,我们在这里使用 json_encode 执行双重任务,但块大小不会受到源数据差异的影响。我针对 50000 字节的块运行了上面的测试脚本,您需要将其5000000用于您的用例。我生成的虚拟数据最多分为整齐的 50K 块。+/- 一组的大小,加上最后一个文件中的剩余部分。在思考这个问题时,我也考虑过这样做strlen(implode(,但考虑到 PHP 的总体性能很好json_encode,为了获得精确的 JSON 字符串大小而进行权衡,不应该有太多的损失。无论如何,一旦块准备好了,我们需要做的就是把它们写下来:foreach($chunks as $n => $chunk) {    $json = json_encode($chunk);    file_put_contents("tmp/chunk_{$n}.json", $json);}...或者匹配您的块命名和目录架构。也许有更聪明的方法可以做到这一点。也就是说,据我所知,核心 PHP 中没有任何内容可以开箱即用地执行此类操作(即使对于普通数组也是如此),并且上述操作应该执行得相当好。请记住有足够的可用内存。:)PS 在计算大小时,我们为每个项目添加 +1,代表{},{},{},或对象分隔符。严格来说,您还需要在总计中添加 +2,因为它将是[{},{},{}],而我们只将每个数组项的长度作为单独的 JSON 对象进行计算。对于其他数据结构,您的补偿里程可能会有所不同。优化更新:如果您选择“精确大小”方法并希望优化内存使用,最好将 JSON 提交集成到分块循环中。(感谢@NigelRen的建议。)如下(其他初始变量如前):$chunk = [];foreach($data as $n => $set) {    if ($size > $maxbytes) {        file_put_contents("tmp/chunk_{$chunkno}.json", json_encode($chunk));        $chunk = [];        $chunkno++;        $size = 0;    }    $size += strlen(json_encode($set)) + 1;    $chunk[] = $set;    //  unset($data[$n]); // in case of memory issues, see notes}如果您对影响感到好奇。通过这种方法,内存使用量达到(已用,最大)1.06 MB、29.34 MB。使用单独的写入例程,26.29 MB、31.8 MB。两个数字都包括unset($data)调用、取消初始数组并释放内存。CPU 方面,两个选项之间没有显着差异。人们还可以$data在每次添加到 后清除数组的成员$chunk[],但是在 5MB 块大小下,这里的内存优势可以忽略不计。初始数组本身的加载/定义是昂贵的,是最大内存使用量的主要因素。(在任何处理开始之前,我使用的测试数组占用了 29.25 MB。)

人到中年有点甜

您可以获取strlen字节并从那里进行计算:$total_size  = strlen(json_encode($array)) / 1024 / 1024;$chunk_size  = floor($total_size / 5);$chunked_array = array_chunk($array, $chunk_size);    foreach($chunked_array as $key => $chunk) {    $i = $key + 1;    file_put_contents("file{$i}.json", json_encode($chunk));}获取 JSON 编码数组的总大小(以字节为单位)并转换为 MB将总大小除以 5MB 即可得到块大小将数组分成块大小循环和 JSON 编码每个块并写入文件或者您可以进行计算:$total_size  = strlen(json_encode($array)); $chunk_size  = floor($total_size / (5 * 1024 * 1024));

蝴蝶刀刀

让我们假设每个项目都具有相同的结构:1500 项 ~= 5MB 25500 items = ~85MB 85MB / 5MB = 17   25500 / 17 = 1500 items代码可以是这样的:foreach(array_chunk($array, 1500) as $arr){ // save array in some file}

PIPIONE

请尝试以下解决方法:<?php&nbsp; &nbsp; $array = array (&nbsp; &nbsp; &nbsp; &nbsp; 0 => array (&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'category' => '179535',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'email' => NULL,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'level' => 1,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'name' => 'FOO'&nbsp; &nbsp; &nbsp; &nbsp; ),&nbsp; &nbsp; &nbsp; &nbsp; 1 => array (&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'category' => '1795',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'email' => NULL,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'level' => 1,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'name' => 'BARFOO'&nbsp; &nbsp; &nbsp; &nbsp; ),&nbsp; &nbsp; &nbsp; &nbsp; 2 => array (&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'category' => '16985',&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'email' => NULL,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'level' => 1,&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 'name' => 'FOOBAR'&nbsp; &nbsp; &nbsp; &nbsp; )&nbsp; &nbsp; );&nbsp; &nbsp; $len = sizeof($array);&nbsp; &nbsp; $fileNameIndex = 1;&nbsp; &nbsp; for($i=0;$i<$len;$i++)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; $fileName = 'file'.$fileNameIndex.'.json';&nbsp; &nbsp; &nbsp; &nbsp; $fileExist = file_exists($fileName);&nbsp; &nbsp; &nbsp; &nbsp; $fileSize = 0;&nbsp; &nbsp; &nbsp; &nbsp; $mode ='w';&nbsp; &nbsp; &nbsp; &nbsp; $current = null;&nbsp; &nbsp; &nbsp; &nbsp; if($fileExist)&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $fileSize = fileSize($fileName);&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $current = json_decode(file_get_contents($fileName), true);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; if($fileExist && $fileSize < 5242880)&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WriteToFile($fileNameIndex, $current, $array[$i], $i);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; else if(!$fileExist)&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WriteToFile($fileNameIndex, $current, $array[$i], $i);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; else&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $fileNameIndex ++;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WriteToFile($fileNameIndex, $current, $array[$i], $i);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; }&nbsp; &nbsp; function WriteToFile($fileNameIndex, $current, $data, $i)&nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; $fileName = 'file'.$fileNameIndex.'.json';&nbsp; &nbsp; &nbsp; &nbsp; $mode ='w';&nbsp; &nbsp; &nbsp; &nbsp; echo "$i index array is being written in $fileName. <br/>";&nbsp; &nbsp; &nbsp; &nbsp; $fileNameIndex ++;&nbsp; &nbsp; &nbsp; &nbsp; $fp = fopen($fileName, $mode);&nbsp; &nbsp; &nbsp; &nbsp; if($current)&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array_push($current, $data);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; else&nbsp; &nbsp; &nbsp; &nbsp; {&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $current = [];&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; array_push($current, $data);&nbsp; &nbsp; &nbsp; &nbsp; }&nbsp; &nbsp; &nbsp; &nbsp; fwrite($fp, json_encode($current));&nbsp; &nbsp; &nbsp; &nbsp; fclose($fp);&nbsp; &nbsp; }?>
随时随地看视频慕课网APP
我要回答