工作循环,预期结果
我正在尝试使用非常大的数据集对代码中的慢速 for 循环进行矢量化,以根据测试删除重复项。结果应该只保留前 3 个元素唯一的元素,而第 4 个元素是所有重复项中最大的元素。例如
in = np.array(((0, 12, 13, 1), (0, 12, 13, 10), (1, 12, 13, 2)))
应该成为
out = np.array(((0, 12, 13, 10), (1, 12, 13, 2)))
使用 for 循环实现这一点很简单,但正如我提到的,它非常慢。
unique = np.unique(in[:, :3], axis=0)
out = np.empty((0, 4))
for i in unique:
out = np.vstack((out, np.hstack((i[:], np.max(in[np.all(in[:, :3] == i[:], axis=1)][:, 3])))))
我试过的 (1)
当我尝试通过将每个替换为以下索引来删除带有索引的 for 循环i[:]时unique[np.arange(unique.shape[0])]:
out = np.vstack((out, np.hstack((unique[np.arange(unique.shape[0])], np.max(in[np.all(in[:, :3].astype(int) == unique[np.arange(unique.shape[0])], axis=1)][:, 3])))))
Numpy 抱怨输入形状连同所有:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<__array_function__ internals>", line 6, in all
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 2351, in all
return _wrapreduction(a, np.logical_and, 'all', axis, None, out, keepdims=keepdims)
File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
numpy.AxisError: axis 1 is out of bounds for array of dimension 0
我试过的(2)
根据输入此问题时 StackOverflow 的建议(Broadcasting/Vectorizing inner and outer for loops in python/NumPy):
newout = np.vstack((newout, np.hstack((tempunique[:, None], np.max(inout[np.all(inout[:, :3].astype(int) == tempunique[:, None], axis=1)][:, 3])))))
我收到一个错误,抱怨输入和输出之间的大小不匹配:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 2
重述问题
是否有正确的方法来广播我的索引以消除 for 循环?
HUH函数
相关分类