如何将两个 2d numpy 数组复制到预分配数组

基准测试我们将仅对各种数据集进行基准测试并从中得出结论。计时使用benchit包（几个基准测试工具打包在一起；免责声明：我是它的作者）对建议的解决方案进行基准测试。基准代码：import numpy as npimport benchitdef numpy_concatenate(a, b):    return np.concatenate((a,b),axis=1)def numpy_hstack(a, b):    return np.hstack((a,b))def preallocate(a, b):    m,n = a.shape[1], b.shape[1]    out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))    out[:,:m] = a    out[:,m:] = b    return out    funcs = [numpy_concatenate, numpy_hstack, preallocate]R = np.random.rand inputs = {n: (R(1000,1000), R(1000,n)) for n in [100, 200, 500, 1000, 200, 5000]}t = benchit.timings(funcs, inputs, multivar=True,   input_name='Col length of b')t.plot(logy=False, logx=True, savepath='plot_1000rows.png')结论：它们在时间上具有可比性。内存分析在内存方面，np.hstack应该类似于np.concatenate. 因此，我们将使用其中之一。让我们设置一个带有大型二维数组的输入数据集。我们将做一些内存基准测试。设置代码：# Filename : memprof_npconcat_preallocate.pyimport numpy as npfrom memory_profiler import profile@profile(precision=10)def numpy_concatenate(a, b):    return np.concatenate((a,b),axis=1)@profile(precision=10)def preallocate(a, b):    m,n = a.shape[1], b.shape[1]    out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))    out[:,:m] = a    out[:,m:] = b    return outR = np.random.randa,b = R(1000,1000), R(1000,1000)if __name__ == '__main__':    numpy_concatenate(a, b)if __name__ == '__main__':    preallocate(a, b)  所以，a是 1000x1000，对于b.跑：$ python3 -m memory_profiler memprof_npconcat_preallocate.py Filename: memprof_npconcat_preallocate.pyLine #    Mem usage    Increment   Line Contents================================================     9  69.3281250000 MiB  69.3281250000 MiB   @profile(precision=10)    10                             def numpy_concatenate(a, b):    11  84.5546875000 MiB  15.2265625000 MiB       return np.concatenate((a,b),axis=1)Filename: memprof_npconcat_preallocate.pyLine #    Mem usage    Increment   Line Contents================================================    13  69.3554687500 MiB  69.3554687500 MiB   @profile(precision=10)    14                             def preallocate(a, b):    15  69.3554687500 MiB   0.0000000000 MiB       m,n = a.shape[1], b.shape[1]    16  69.3554687500 MiB   0.0000000000 MiB       out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))    17  83.6484375000 MiB  14.2929687500 MiB       out[:,:m] = a    18  84.4218750000 MiB   0.7734375000 MiB       out[:,m:] = b    19  84.4218750000 MiB   0.0000000000 MiB       return out因此，对于preallocatemethod 来说，总的 mem 消耗为14.2929687500+ 0.7734375000，略小于15.2265625000.将输入数组的大小更改为 5000x5000a和b-$ python3 -m memory_profiler memprof_npconcat_preallocate.pyFilename: memprof_npconcat_preallocate.pyLine #    Mem usage    Increment   Line Contents================================================     9 435.4101562500 MiB 435.4101562500 MiB   @profile(precision=10)    10                             def numpy_concatenate(a, b):    11 816.8515625000 MiB 381.4414062500 MiB       return np.concatenate((a,b),axis=1)Filename: memprof_npconcat_preallocate.pyLine #    Mem usage    Increment   Line Contents================================================    13 435.5351562500 MiB 435.5351562500 MiB   @profile(precision=10)    14                             def preallocate(a, b):    15 435.5351562500 MiB   0.0000000000 MiB       m,n = a.shape[1], b.shape[1]    16 435.5351562500 MiB   0.0000000000 MiB       out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype)))    17 780.3203125000 MiB 344.7851562500 MiB       out[:,:m] = a    18 816.9296875000 MiB  36.6093750000 MiB       out[:,m:] = b    19 816.9296875000 MiB   0.0000000000 MiB       return out同样，预分配的总数较少。结论：预分配方法具有稍好的内存优势，这在某种程度上是有道理的。使用连接，我们有三个涉及 src1 + src2 -> dst 的数组，而使用预分配，只有 src 和 dst，虽然分两步，但内存拥塞较少。

如何将两个 2d numpy 数组复制到预分配数组

2回答