Anaconda的NumbaPro CUDA断言错误

我正在尝试使用NumbaPro的cuda扩展来乘以大数组矩阵。我最后想要的是将大小为NxN的矩阵与对角矩阵相乘,该对角矩阵将作为一维矩阵输入(因此,a.dot(numpy.diagflat(b)),我发现它与a * b)。但是,我收到一个断言错误,它不提供任何信息。


只有将两个1D数组矩阵相乘,我才能避免此断言错误,但这不是我想要的。


from numbapro import vectorize, cuda

from numba import f4,f8

import numpy as np


def generate_input(n):

    import numpy as np

    A = np.array(np.random.sample((n,n)))

    B = np.array(np.random.sample(n) + 10)

    return A, B


def product(a, b):

    return a * b


def main():

    cu_product = vectorize([f4(f4, f4), f8(f8, f8)], target='gpu')(product)


    N = 1000


    A, B = generate_input(N)

    D = np.empty(A.shape)


    stream = cuda.stream()


    with stream.auto_synchronize():

        dA = cuda.to_device(A, stream)

        dB = cuda.to_device(B, stream)

        dD = cuda.to_device(D, stream, copy=False)

        cu_product(dA, dB, out=dD, stream=stream)

        dD.to_host(stream)


if __name__ == '__main__':

    main()

这是我的终端吐出的内容:


Traceback (most recent call last):

  File "cuda_vectorize.py", line 32, in <module>

    main()

  File "cuda_vectorize.py", line 28, in main

    cu_product(dA, dB, out=dD, stream=stream)

  File "/opt/anaconda1anaconda2anaconda3/lib/python2.7/site-packages/numbapro/_cudadispatch.py", line 109, in __call__

  File "/opt/anaconda1anaconda2anaconda3/lib/python2.7/site-packages/numbapro/_cudadispatch.py", line 191, in _arguments_requirement

AssertionError


慕桂英4014372
浏览 480回答 2
2回答

沧海一幻觉

只是为了回弹所有这些考虑因素。我还想在CUDA上实现一些矩阵计算,但是后来听说了numpy.einsum函数。事实证明,einsum的速度非常快。在这种情况下,这是它的代码。但是它可以应用于许多类型的计算。G = np.einsum('ij,j -> ij',A, B)就速度而言,这是N = 10000的结果Numpy took&nbsp; &nbsp; 8.387756 secondsCUDA JIT took 0.218394 seconds, 38.41x speedupEINSUM took 0.131751 seconds, 63.66x speedup
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python