连接两个数据集,都带有点

我有两个带点的 csv 文件。一个学校数据集(纬度、经度和学校名称)和一个带有房屋坐标(纬度、经度和houseid)的数据集。


我想列出距离学校 500 米范围内的所有房屋。


我真的不知道如何在 Python 中与 geopandas 进行空间连接。有人可以帮帮我吗?


schools.csv

56.039484;14.164114;Parkskolan

56.029687;14.159337;Centralskolan



houses.csv

56.039240;14.165066;1

56.039008;14.166709;2

56.038608;14.169420;3


肥皂起泡泡
浏览 175回答 1
1回答

慕少森

获得解决方案的主要步骤:将 2 个数据文件读入数据帧设置 CRS('epsg:4326') 并从 (lat,long) 为两个数据帧创建点几何对于schools数据帧,将 CRS 转换为 UTMzone 33N在schools数据帧上做缓冲(半径 = 500m)在schools数据帧上,执行并将 500m 缓冲区设置为新的geometry在公共 CRS之间houses和schools中进行适当的空间连接在houses_joined数据框中获取结果这是工作代码:import pandas as pdimport geopandas as gpdfrom shapely.geometry import Point, Polygon# School data# -----------# read `schools.csv`, data are in (lat,long); 'epsg:4326'## lat;lon;school_name# 56.039484;14.164114;Parkskolan# 56.029687;14.159337;Centralskolandf_schools = pd.read_csv('schools.csv', na_values=['NaN'], sep=';')# create Point geometry objects from (lon,lat)sch_geom = [Point(xy) for xy in zip(df_schools.lon, df_schools.lat)]# set initial coordinate ref system, and geometry column to the dataframegdf_schools = gpd.GeoDataFrame(df_schools, crs={'init': 'epsg:4326'}, geometry=sch_geom)# convert CRS from (lat,long) to UTMzone 33N# and get new dataframe: gdf_schools_utm33Ngdf_schools_utm33N = gdf_schools.to_crs(crs="+proj=utm +zone=33 +ellps=WGS84 +datum=WGS84 +units=m +no_defs")# Note: crs="..." can be replaced by epsg=32633# do buffering, radius: 500mgdf_schools_utm33N['buffer_geometry'] = gdf_schools_utm33N.geometry.buffer(500)# rename `geometry` -> `original_geometry`; `buffer_geometry` -> geometry# .. and set column `geometry` as the default geometry data of the geodataframe.gdf_schools_utm33N = gdf_schools_utm33N.rename(    columns={'geometry':'original_geometry', 'buffer_geometry':'geometry'}).set_geometry('geometry')# Houses data# -----------# read `houses.csv`, data are in (lat,long); 'epsg:4326'# lat;lon;houseid# 56.039240;14.165066;1# 56.039008;14.166709;2# 56.038608;14.169420;3# 56.046108;14.171420;4df_houses = pd.read_csv('houses.csv', na_values=['NaN'], sep=';')# I add the 4th house that is too far away from all schools# The 4th house: 56.046108  14.171420   4# create Point geometry for the houses, and init CRShs_geom = [Point(xy) for xy in zip(df_houses.lon, df_houses.lat)]gdf_houses = gpd.GeoDataFrame(df_houses, crs={'init': 'epsg:4326'}, geometry=hs_geom)# options: plot the schools' buffers and all the housesax = gdf_schools_utm33N.plot(color='lightgray', edgecolor='green', alpha=0.5)gdf_houses.to_crs(epsg=32633).plot(ax=ax, color='red')# ******* Spatial Join *****************# houses data frame needs CRS conversionhss = gdf_houses.to_crs(epsg=32633)# do spatial join of houses(points) ~ schools(circles of 500m radius)houses_joined = gpd.sjoin(hss, gdf_schools_utm33N, op='within', how='inner')# print out the successful joined rows (house_id, school_names)# this prints house_id + school_name houses_joined[['houseid','school_name']]# Output: house_id, school_name# 1    Parkskolan# 2    Parkskolan# 3    Parkskolan结果图:
打开App,查看更多内容
随时随地看视频慕课网APP

相关分类

Python