加速熊猫中的双 iterrows()

你的问题是你循环太多次。至少，您应该计算一个距离矩阵并计算有多少点落在该矩阵的半径内。但是，最快的解决方案是使用 numpy 的向量化函数，它们是高度优化的 C 代码。与大多数学习经验一样，最好从一个小问题开始：>>> import numpy as np>>> import pandas as pd>>> from scipy.spatial import distance_matrix# Create a dataframe with columns two MID_X and MID_Y assigned at random>>> np.random.seed(42)>>> df = pd.DataFrame(np.random.uniform(1, 10, size=(5, 2)), columns=['MID_X', 'MID_Y'])>>> df.index.name = 'PointID'            MID_X     MID_YPointID                    0        4.370861  9.5564291        7.587945  6.3879262        2.404168  2.4039513        1.522753  8.7955854        6.410035  7.372653# Calculate the distance matrix>>> cols = ['MID_X', 'MID_Y']>>> d = distance_matrix(df[cols].values, df[cols].values)array([[0.        , 4.51542241, 7.41793942, 2.94798323, 2.98782637],        [4.51542241, 0.        , 6.53786001, 6.52559479, 1.53530446],        [7.41793942, 6.53786001, 0.        , 6.4521226 , 6.38239593],        [2.94798323, 6.52559479, 6.4521226 , 0.        , 5.09021286],        [2.98782637, 1.53530446, 6.38239593, 5.09021286, 0.        ]])# The radii for which you want to measure. They need to be raised # up 2 extra dimensions to prepare for array broadcasting later>>> radii = np.array([3,6,9])[:, None, None]array([[[3]],       [[6]],       [[9]]])# Count how many points fall within a certain radius from another# point using numpy's array broadcasting. `d < radii` will return# an array of `True/False` and we can count the number of `True`# by `sum` over the last axis.## The distance between a point to itself is 0 and we don't want# to count that hence the -1.>>> count = (d < radii).sum(axis=-1) - 1array([[2, 1, 0, 1, 2],       [3, 2, 0, 2, 3],       [4, 4, 4, 4, 4]])# Putting everything together for export>>> result = pd.DataFrame(count, index=radii.flatten()).stack().to_frame('Count')>>> result.index.names = ['Radius', 'PointID']                CountRadius PointID       3      0            2       1            1       2            0       3            1       4            26      0            3       1            2       2            0       3            2       4            39      0            4       1            4       2            4       3            4       4            4最终结果意味着在半径 3 内，点 #0 有 2 个邻居，点 #1 有 1 个邻居，点 #2 有 0 个邻居，依此类推。根据您的喜好重塑和格式化框架。将其扩展到数千个点应该没有问题。

加速熊猫中的双 iterrows()

1回答