森栏
这是一种可能有效的方法。考虑:在两个数据框Points和Links之间生成叉积,然后对新 DataFrame 中的每一行应用一个函数。查找函数为每个组报告的最小距离。我们将新的 df 称为PointsLinks。下面是一些采用这种方法的代码:import pandas as pdimport random Points = pd.DataFrame( [ [ 1,2 ], [ 3,4 ], [ 5,6 ] ], columns = [ 'longitude', 'latitude' ] )Links = pd.DataFrame( [ [ 'Link1', ( 4,3 ) , ( -1, -2 ) ], [ 'Link2', (10,10) , ( -5, -5 ) ] ], columns = [ 'linkid', 'lon1&lat1', 'lon2&lat2' ] ) print(Points) print(Links) #Step 1: https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandasdef cartesian_product_basic(left, right): return ( left.assign(key=1).merge(right.assign(key=1), on='key').drop('key', 1)) def DistanceToLink( pointlink ): return random.randrange(10) PointsLinks = cartesian_product_basic(Points,Links) print( PointsLinks ) #Step 2: https://stackoverflow.com/questions/26886653/pandas-create-new-column-based-on-values-from-other-columns-apply-a-function-oPointsLinks['distance'] = PointsLinks.apply( lambda row : DistanceToLink(row), axis = 'columns' )print( PointsLinks )#Step 3: Find the smallest distance per group https://stackoverflow.com/questions/27842613/pandas-groupby-sort-within-groupsclosest = PointsLinks.sort_values( [ 'latitude', 'longitude', 'distance' ] , ascending = True ).groupby( [ 'latitude', 'longitude'] ).head(1)# Drop the unnecessary columnsclosest.drop( columns = ['lon1&lat1','lon2&lat2','distance'] , inplace=True) print(closest)以下是代码创建的数据框:要点: longitude latitude0 1 21 3 42 5 6 链接: linkid lon1&lat1 lon2&lat20 Link1 (4, 3) (-1, -2)1 Link2 (10, 10) (-5, -5)然后是 PointsLinks(在使用 apply() 添加距离列之后: longitude latitude linkid lon1&lat1 lon2&lat2 distance0 1 2 Link1 (4, 3) (-1, -2) 11 1 2 Link2 (10, 10) (-5, -5) 62 3 4 Link1 (4, 3) (-1, -2) 03 3 4 Link2 (10, 10) (-5, -5) 94 5 6 Link1 (4, 3) (-1, -2) 55 5 6 Link2 (10, 10) (-5, -5) 1我没有实施DistanceToLink。我只是在那里放了一个随机数生成器。这是第一个pointlink对象的样子(它是一个代表一行的系列):longitude 1latitude 2linkid Link1lon1&lat1 (4, 3)lon2&lat2 (-1, -2)现在您有了每个组合的距离,您可以找到并选择具有最短距离的 PointLink 对(使用pandas groupby sort within groups):closest = PointsLinks.sort_values( [ 'latitude', 'longitude', 'distance' ] , ascending = True ).groupby( [ 'latitude', 'longitude'] ).head(1)以下是结果: longitude latitude linkid0 1 2 Link12 3 4 Link15 5 6 Link2