如何避免在 heapq 中使用 _siftup 或

使用heapify。您必须记住的一件重要事情是理论复杂性和性能是两个不同的东西（即使它们是相关的）。换句话说，实施也很重要。渐近复杂性为您提供了一些下限，您可以将其视为保证，例如 O(n) 中的算法确保在最坏的情况下，您将执行许多输入大小线性的指令。这里有两件重要的事情：常量被忽略，但常量在现实生活中很重要；最坏的情况取决于您考虑的算法，而不仅仅是输入。根据您考虑的主题/问题，第一点可能非常重要。在某些域中，隐藏在渐近复杂性中的常量非常大，以至于您甚至无法构建大于常量的输入（或者考虑该输入是不现实的）。情况并非如此，但这是您始终必须牢记的事情。给出这两个观察结果，你真的不能说：实现 B 比 A 快，因为 A 派生自 O(n) 算法而 B 派生自 O(log n) 算法。即使这是一个很好的论据，但它并不总是足够的。当所有输入发生的可能性相同时，理论复杂性特别适合比较算法。换句话说，当你的算法非常通用时。如果您知道您的用例和输入是什么，您可以直接测试性能。使用测试和渐近复杂度将使您对算法的执行方式有一个很好的了解（在极端情况下和任意实际情况下）。话虽如此，让我们对以下将实施三种不同策略的类运行一些性能测试（这里实际上有四种策略，但Invalidate 和 Reinsert在您的情况下似乎不正确，因为您会使每个项目无效的次数与你看到一个给定的词）。我将包括我的大部分代码，以便您可以仔细检查我没有搞砸（您甚至可以检查完整的笔记本）：from heapq import _siftup, _siftdown, heapify, heappopclass Heap(list):  def __init__(self, values, sort=False, heap=False):    super().__init__(values)    heapify(self)    self._broken = False    self.sort = sort    self.heap = heap or not sort  # Solution 1) repair using the knowledge we have after every update:          def update(self, key, value):    old, self[key] = self[key], value    if value > old:        _siftup(self, key)    else:        _siftdown(self, 0, key)      # Solution 2 and 3) repair using sort/heapify in a lazzy way:  def __setitem__(self, key, value):    super().__setitem__(key, value)    self._broken = True      def __getitem__(self, key):    if self._broken:        self._repair()        self._broken = False    return super().__getitem__(key)  def _repair(self):      if self.sort:        self.sort()    elif self.heap:        heapify(self)  # … you'll also need to delegate all other heap functions, for example:  def pop(self):    self._repair()    return heappop(self)我们可以首先检查所有三种方法是否有效：data = [10, 5, 18, 2, 37, 3, 8, 7, 19, 1]heap = Heap(data[:])heap.update(8, 22)heap.update(7, 4)print(heap)heap = Heap(data[:], sort_fix=True)heap[8] = 22heap[7] = 4print(heap)heap = Heap(data[:], heap_fix=True)heap[8] = 22heap[7] = 4print(heap)然后我们可以使用以下函数运行一些性能测试：import timeimport randomdef rand_update(heap, lazzy_fix=False, **kwargs):    index = random.randint(0, len(heap)-1)    new_value = random.randint(max_int+1, max_int*2)    if lazzy_fix:        heap[index] = new_value    else:        heap.update(index, new_value)    def rand_updates(n, heap, lazzy_fix=False, **kwargs):    for _ in range(n):        rand_update(heap, lazzy_fix)        def run_perf_test(n, data, **kwargs):    test_heap = Heap(data[:], **kwargs)    t0 = time.time()    rand_updates(n, test_heap, **kwargs)    test_heap[0]    return (time.time() - t0)*1e3results = []max_int = 500nb_updates = 1for i in range(3, 7):    test_size = 10**i    test_data = [random.randint(0, max_int) for _ in range(test_size)]    perf = run_perf_test(nb_updates, test_data)    results.append((test_size, "update", perf))        perf = run_perf_test(nb_updates, test_data, lazzy_fix=True, heap_fix=True)    results.append((test_size, "heapify", perf))    perf = run_perf_test(nb_updates, test_data, lazzy_fix=True, sort_fix=True)    results.append((test_size, "sort", perf))结果如下：import pandas as pdimport seaborn as snsdtf = pd.DataFrame(results, columns=["heap size", "method", "duration (ms)"])print(dtf)sns.lineplot(    data=dtf,     x="heap size",     y="duration (ms)",     hue="method",)从这些测试中我们可以看到，这heapify似乎是最合理的选择，它在最坏的情况下具有相当的复杂性：O(n) 并且在实践中表现更好。另一方面，调查其他选项可能是个好主意（例如拥有专用于该特定问题的数据结构，例如使用垃圾箱将单词放入其中，然后将它们从垃圾箱移动到下一个看起来像可能的轨道调查）。重要说明：这种情况（更新与读取比率为 1:1）对heapify和sort解决方案都不利。所以，如果你管理有AK：1分的比例，这一结论将更加清晰（你可以替换nb_updates = 1与nb_updates = k上面的代码）。数据框详细信息：    heap size   method  duration in ms0        1000   update        0.4351141        1000  heapify        0.0731952        1000     sort        0.1010893       10000   update        1.6689304       10000  heapify        0.4801755       10000     sort        1.1510856      100000   update       13.1940847      100000  heapify        4.8758988      100000     sort       11.9221219     1000000   update      153.58710310    1000000  heapify       51.23710611    1000000     sort      145.306110

如何避免在 heapq 中使用 _siftup 或 _siftdown

2回答