Numba @jit(nopython=True) 函数对繁重的 Numpy 函数没有提供速度提升

我目前正在运行test_matrix_speed()以查看我的search_and_book_availability功能有多快。使用 PyCharm 分析器,我可以看到每个search_and_book_availability函数调用的平均速度为 0.001 毫秒。拥有 Numba@jit(nopython=True)装饰器对该函数的性能没有影响。这是因为没有任何改进并且 Numpy 在这里运行得尽可能快吗?(我不在乎generate_searches函数的速度)


这是我正在运行的代码


import random


import numpy as np

from numba import jit



def generate_searches(number, sim_start, sim_end):

    searches = []

    for i in range(number):

        start_slot = random.randint(sim_start, sim_end - 1)

        end_slot = random.randint(start_slot + 1, sim_end)

        searches.append((start_slot, end_slot))

    return searches



@jit(nopython=True)

def search_and_book_availability(matrix, search_start, search_end):

    search_slice = matrix[:, search_start:search_end]

    output = np.where(np.sum(search_slice, axis=1) == 0)[0]

    number_of_bookable_vecs = output.size

    if number_of_bookable_vecs > 0:

        if number_of_bookable_vecs == 1:

            id_to_book = output[0]

        else:

            id_to_book = np.random.choice(output)

        matrix[id_to_book, search_start:search_end] = 1

        return True

    else:

        return False



def test_matrix_speed():

    shape = (10, 1440)

    matrix = np.zeros(shape)

    sim_start = 0

    sim_end = 1440

    searches = generate_searches(1000000, sim_start, sim_end)

    for i in searches:

        search_start = i[0]

        search_end = i[1]

        availability = search_and_book_availability(matrix, search_start, search_end)


www说
浏览 661回答 1
1回答

眼眸繁星

使用您的函数和以下代码来分析速度import timeshape = (10, 1440)matrix = np.zeros(shape)sim_start = 0sim_end = 1440searches = generate_searches(1000000, sim_start, sim_end)def reset():    matrix[:] = 0def test_matrix_speed():    for i in searches:        search_start = i[0]        search_end = i[1]        availability = search_and_book_availability(matrix, search_start, search_end)def timeit(func):    # warmup    reset()    func()    reset()    start = time.time()    func()    end = time.time()    return end - startprint(timeit(test_matrix_speed))我发现jited 版本大约为 11.5s,而没有jit. 我不是 numba 方面的专家,但它的目的是优化以非矢量化方式编写的数字代码,尤其是显式for循环。在您的代码中没有,您只使用矢量化操作。因此,我预计jit不会超过基线解决方案,但我必须承认,我很惊讶地看到它更糟。如果您想优化您的解决方案,您可以使用以下代码减少执行时间(至少在我的 PC 上):def search_and_book_availability_opt(matrix, search_start, search_end):    search_slice = matrix[:, search_start:search_end]    # we don't need to sum in order to check if all elements are 0.    # ndarray.any() can use short-circuiting and is therefore faster.    # Also, we don't need the selected values from np.where, only the    # indexes, so np.nonzero is faster    bookable, = np.nonzero(~search_slice.any(axis=1))    # short circuit    if bookable.size == 0:        return False    # we can perform random choice even if size is 1    id_to_book = np.random.choice(bookable)    matrix[id_to_book, search_start:search_end] = 1    return True并通过初始化matrix为np.zeros(shape, dtype=np.bool),而不是默认值float64。我能够获得大约 3.8 秒的执行时间,比您的 unjited 解决方案提高了约 50%,比 jted 版本提高了约 70%。希望有帮助。
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python