繁星淼淼
基准测试我们将仅对各种数据集进行基准测试并从中得出结论。计时使用benchit包(几个基准测试工具打包在一起;免责声明:我是它的作者)对建议的解决方案进行基准测试。基准代码:import numpy as npimport benchitdef numpy_concatenate(a, b): return np.concatenate((a,b),axis=1)def numpy_hstack(a, b): return np.hstack((a,b))def preallocate(a, b): m,n = a.shape[1], b.shape[1] out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype))) out[:,:m] = a out[:,m:] = b return out funcs = [numpy_concatenate, numpy_hstack, preallocate]R = np.random.rand inputs = {n: (R(1000,1000), R(1000,n)) for n in [100, 200, 500, 1000, 200, 5000]}t = benchit.timings(funcs, inputs, multivar=True, input_name='Col length of b')t.plot(logy=False, logx=True, savepath='plot_1000rows.png')结论:它们在时间上具有可比性。内存分析在内存方面,np.hstack应该类似于np.concatenate. 因此,我们将使用其中之一。让我们设置一个带有大型二维数组的输入数据集。我们将做一些内存基准测试。设置代码:# Filename : memprof_npconcat_preallocate.pyimport numpy as npfrom memory_profiler import profile@profile(precision=10)def numpy_concatenate(a, b): return np.concatenate((a,b),axis=1)@profile(precision=10)def preallocate(a, b): m,n = a.shape[1], b.shape[1] out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype))) out[:,:m] = a out[:,m:] = b return outR = np.random.randa,b = R(1000,1000), R(1000,1000)if __name__ == '__main__': numpy_concatenate(a, b)if __name__ == '__main__': preallocate(a, b) 所以,a是 1000x1000,对于b.跑 :$ python3 -m memory_profiler memprof_npconcat_preallocate.py Filename: memprof_npconcat_preallocate.pyLine # Mem usage Increment Line Contents================================================ 9 69.3281250000 MiB 69.3281250000 MiB @profile(precision=10) 10 def numpy_concatenate(a, b): 11 84.5546875000 MiB 15.2265625000 MiB return np.concatenate((a,b),axis=1)Filename: memprof_npconcat_preallocate.pyLine # Mem usage Increment Line Contents================================================ 13 69.3554687500 MiB 69.3554687500 MiB @profile(precision=10) 14 def preallocate(a, b): 15 69.3554687500 MiB 0.0000000000 MiB m,n = a.shape[1], b.shape[1] 16 69.3554687500 MiB 0.0000000000 MiB out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype))) 17 83.6484375000 MiB 14.2929687500 MiB out[:,:m] = a 18 84.4218750000 MiB 0.7734375000 MiB out[:,m:] = b 19 84.4218750000 MiB 0.0000000000 MiB return out因此,对于preallocatemethod 来说,总的 mem 消耗为14.2929687500+ 0.7734375000,略小于15.2265625000.将输入数组的大小更改为 5000x5000a和b-$ python3 -m memory_profiler memprof_npconcat_preallocate.pyFilename: memprof_npconcat_preallocate.pyLine # Mem usage Increment Line Contents================================================ 9 435.4101562500 MiB 435.4101562500 MiB @profile(precision=10) 10 def numpy_concatenate(a, b): 11 816.8515625000 MiB 381.4414062500 MiB return np.concatenate((a,b),axis=1)Filename: memprof_npconcat_preallocate.pyLine # Mem usage Increment Line Contents================================================ 13 435.5351562500 MiB 435.5351562500 MiB @profile(precision=10) 14 def preallocate(a, b): 15 435.5351562500 MiB 0.0000000000 MiB m,n = a.shape[1], b.shape[1] 16 435.5351562500 MiB 0.0000000000 MiB out = np.empty((a.shape[0],m+n), dtype=np.result_type((a.dtype, b.dtype))) 17 780.3203125000 MiB 344.7851562500 MiB out[:,:m] = a 18 816.9296875000 MiB 36.6093750000 MiB out[:,m:] = b 19 816.9296875000 MiB 0.0000000000 MiB return out同样,预分配的总数较少。结论:预分配方法具有稍好的内存优势,这在某种程度上是有道理的。使用连接,我们有三个涉及 src1 + src2 -> dst 的数组,而使用预分配,只有 src 和 dst,虽然分两步,但内存拥塞较少。
幕布斯7119047
numpy编译的代码,例如concatenate通常确定它需要多大的返回数组,创建该数组,并将值复制到它。它通过 C-API 调用实现这一点的事实对内存使用没有任何影响。concatenate不会覆盖或重用参数使用的任何内存。In [465]: A, B = np.ones((1000,1000)), np.zeros((1000,500))一些时间比较:In [466]: timeit np.concatenate((A,B), axis=1) 6.73 ms ± 338 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)In [467]: C = np.zeros((1000,1500)) In [468]: timeit np.concatenate((A,B), axis=1, out=C) 6.44 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)In [469]: %%timeit ...: C = np.zeros((1000,1500)) ...: np.concatenate((A,B), axis=1, out=C) 11.5 ms ± 358 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)In [470]: %%timeit ...: C = np.zeros((1000,1500)) ...: C[:,:1000]=A; C[:,1000:]=B 11.5 ms ± 282 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)In [471]: %%timeit ...: C[:,:1000]=A; C[:,1000:]=B 6.29 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)因此,如果目标数组已经存在,请使用它。但是,仅仅为了这个目的而创建一个似乎并没有太大的优势。