HUX布斯
Scipy 可以帮助你。请看以下假设示例:import pandas as pd from scipy.spatial import cKDTreedataset1 = pd.DataFrame(pd.np.random.rand(100,3))dataset2 = pd.DataFrame(pd.np.random.rand(10, 3))ck = cKDTree(dataset1.values)ck.query_ball_point(dataset2.values, r=0.1)数组([列表([]),列表([]),列表([]),列表([]),列表([28, 83]),列表([79]),列表([]),列表([86]), 列表([40]), 列表([29, 60, 95])], dtype=object)
函数式编程
使用 Numpy 方法:如果您的两个数据框如下所示:df1 coords0 (4,3,5)1 (5,4,3)df2 coords0 (6,7,8)1 (8,7,6)然后:import numpy as npfrom itertools import product#convert dataframes into numpy arraysdf1_arr = np.array([np.array(x) for x in df1.coords.values])df2_arr = np.array([np.array(x) for x in df2.coords.values])#create array of cartesian product of elements of the two arrayscart_arr = np.array([x for x in product(df1_arr,df2_arr)])#compute Euclidian distance (or norm) between pairs of elements in two arrays#outputs new array with one value per pair of coordinatesnorms_arr = np.linalg.norm(np.diff(cart_arr,axis=1)[:,0,:],axis=1)#create distance threshold for "close enough"radius = 5.5#find values in norms array that are less than or equal to distance thresholdgood_idxs = np.argwhere(norms_arr <= radius)[:,0]good_coord_pairs = cart_arr[good_idxs]#store corresponding pairs of coordinates and distances in new dataframefinal_df = pd.DataFrame({'df1_coords':list(map(tuple,good_coord_pairs[:,0,:])), 'df2_coords':list(map(tuple(good_coord_pairs[:,1,:])), 'distance':norms_arr[good_idxs], index=list(range(len(good_coord_pairs))))将产生:final_df df1_coords df2_coords distance0 (4,3,5) (6,7,8) 5.3851651 (5,4,3) (8,7,6) 5.196152