我有一个 3 轴数据的 DataFrames,带有一个成员资格标签,我用它来分组:
df = pd.DataFrame( [[0, 1, 2, 0], [-1, 0, 1, 0], [-2, 0, 3, 1], [1, 1, 3, 1], [1, 0, 2, 2], [1, 0, 3, 2], [6, 2, 1, 5], [-4, 3, 0, 5], [1, 0, -1, 6], [0, 0, 3, 6]], columns = ['x', 'y', 'z', 'member'])
我的目标有点做作:我希望找到每个组的点与下一个组之间的成对距离,从小到大排序。这就是我所说的交错的意思:n_skip
n_skip
例如,对于 ,我希望找到以下距离:n_skip=2
带有 --> against 的行member == 0
member == 1, 2
带有 --> 反对的行member == 1
member == 2, 5
带有 --> 反对的行member == 2
member == 5, 6
带有 --> 反对的行member == 5
member == 6
没有计算 。member == 6
有没有一种高性能的方法可以在没有嵌套的for循环的情况下做到这一点?这个问题的答案中提到了这一点。直观地说,我无法使用传统的方法来并行化 Pandas DataFrame 上的函数。将函数应用于一组交错组的快速方法是什么?apply
EDIT1 我的解决方案(仅适用于一个轴):
## Heading ### Organize by group membership
groups = df.groupby('member')
# Define constants
max_member = 6
n_skip = 2
start_row = 0
matrix = np.zeros((df.shape[0], df.shape[0]))
# Iterate for each group
for i in range(max_member):
try:
pts_curr = groups.get_group(i)
except KeyError:
continue
# Save end row index
end_row = start_row + pts_curr.shape[0]
# Save start col index
start_col = end_row
# Grab the destination group nodes
for j in range(i+1, int(np.min([i+n_skip+1, max_member]))):
try:
pts_clr_next = groups.get_group(j)
except KeyError:
continue
# Save end col index
end_col = start_col + pts_clr_next.shape[0]
# Calculate cdist
z_sq = cdist(pts_curr[['z']], pts_next[['z']])
# Save results in matrix at right positions
matrix[start_row:end_row, start_col:end_col] = z_sq
# update col index
start_col = end_col
# update row index
start_row = end_row
慕哥6287543