为什么np.where()在数组切片的副本上比在原始数组上的视图更快?

我正在分析一些代码,发现结果令我感到惊讶np.where()。我想where()在数组的一部分上使用(知道2D数组的很大一部分与我的搜索无关),并发现它是我代码中的瓶颈。作为测试,我创建了一个新的2D数组作为该切片的副本,并测试了的速度where()。事实证明,它的运行速度明显更快。在我的实际情况中,速度的提高确实非常显着,但是我认为此测试代码仍然可以证明我的发现:


import numpy as np


def where_on_view(arr):

    new_arr = np.where(arr[:, 25:75] == 5, arr[:, 25:75], np.NaN)


def where_on_copy(arr):

    copied_arr = arr[:, 25:75].copy()

    new_arr = np.where(copied_arr == 5, copied_arr, np.NaN)


arr = np.random.choice(np.arange(10), 1000000).reshape(1000, 1000)

而timeit结果:


%timeit where_on_view(arr)

398 µs ± 2.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit where_on_copy(arr)

295 µs ± 6.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

由于这两种方法都返回一个新数组,因此我不清楚如何事先获取切片的完整副本才能达到np.where()这种程度。我还进行了一些健全性检查,以确认:


在这种情况下,它们都返回相同的结果。

where() 搜索实际上仅限于切片,而不是检查整个数组,然后过滤输出。

这里:


# Sanity check that they do give the same output


test_arr = np.random.choice(np.arange(3), 25).reshape(5, 5)

test_arr_copy = test_arr[:, 1:3].copy()


print("No copy")

print(np.where(test_arr[:, 1:3] == 2, test_arr[:, 1:3], np.NaN))

print("With copy")

print(np.where(test_arr_copy == 2, test_arr_copy, np.NaN))


# Sanity check that it doesn't search the whole array


def where_on_full_array(arr):

    new_arr = np.where(arr == 5, arr, np.NaN)


#%timeit where_on_full_array(arr)

#7.54 ms ± 47.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

我很好奇这种情况下增加的开销来自哪里?


慕容708150
浏览 206回答 1
1回答

婷婷同学_

以下是一些源代码片段,这些片段至少部分地解释了观察结果。我没有考虑,where因为差异似乎是以前创建的。相反,我ufuncs通常在看。ufuncs的基本功能暂时忽略一些特殊的套管函数,这是由覆盖其他尺寸的外部循环内部可能进行了优化的最内层一维循环计算出来的。外循环比较昂贵,它使用numpy nditer,因此必须设置它,并且对于每个迭代调用(iternext它是一个函数指针)都必须进行设置,因此没有内联。相比之下,内部循环是一个简单的C循环。交错的ufunc评估会产生大量开销来自numpy / core / src / umath / ufunc_object.c所包含的numpy / core / src / private / lowlevel_strided_loops.h/*&nbsp;*&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; TRIVIAL ITERATION&nbsp;*&nbsp;* In some cases when the iteration order isn't important, iteration over&nbsp;* arrays is trivial.&nbsp; This is the case when:&nbsp;*&nbsp; &nbsp;* The array has 0 or 1 dimensions.&nbsp;*&nbsp; &nbsp;* The array is C or Fortran contiguous.&nbsp;* Use of an iterator can be skipped when this occurs.&nbsp; These macros assist&nbsp;* in detecting and taking advantage of the situation.&nbsp; Note that it may&nbsp;* be worthwhile to further check if the stride is a contiguous stride&nbsp;* and take advantage of that.因此,我们看到ufunc具有连续参数的a可以通过对快速内部循环的一次调用来评估,从而完全绕过外部循环。为了理解复杂的差异,开销看看功能trivial_two/three_operand_loopVSiterator_loop在numpy的/核心/ src目录/ umath / ufunc_object.c和所有npyiter_iternext_*在numpy的功能/核心/ src目录/多阵列/ nditer_templ.c交错的ufunc eval比交错的副本更昂贵从自动生成的numpy / core / src / multiarray / lowlevel_strided_loops.c/*&nbsp;* This file contains low-level loops for copying and byte-swapping&nbsp;* strided data.&nbsp;*该文件将近25万行。相比之下,还自动生成的文件numpy / core / src / umath / loops.c提供了最里面的ufunc循环,大约只有1.5万行。这本身表明复制可能比ufunc评估更优化。这里相关的是宏/* Start raw iteration */#define NPY_RAW_ITER_START(idim, ndim, coord, shape) \&nbsp; &nbsp; &nbsp; &nbsp; memset((coord), 0, (ndim) * sizeof(coord[0])); \&nbsp; &nbsp; &nbsp; &nbsp; do {[...]/* Increment to the next n-dimensional coordinate for two raw arrays */#define NPY_RAW_ITER_TWO_NEXT(idim, ndim, coord, shape, \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; dataA, stridesA, dataB, stridesB) \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for ((idim) = 1; (idim) < (ndim); ++(idim)) { \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if (++(coord)[idim] == (shape)[idim]) { \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (coord)[idim] = 0; \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (dataA) -= ((shape)[idim] - 1) * (stridesA)[idim]; \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (dataB) -= ((shape)[idim] - 1) * (stridesB)[idim]; \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else { \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (dataA) += (stridesA)[idim]; \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; (dataB) += (stridesB)[idim]; \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; break; \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } \&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; } \&nbsp; &nbsp; &nbsp; &nbsp; } while ((idim) < (ndim))由raw_array_assign_arraynumpy / core / src / multiarray / array_assign_array.c中的函数使用,该函数为Pythonndarray.copy方法进行实际复制。我们可以看到,与ufuncs使用的“完整迭代”相比,“原始迭代”的开销相当小。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python