猿问

将函数应用于数据框中的每个观察值

我有一个很大的坐标df,我正在通过一个函数(反向地理编码器),我怎样才能在不迭代的情况下遍历整个df(需要很长时间)


示例 df:


    Latitude    Longitude  

0   -25.66026   28.0914    

1   -25.67923   28.10525    

2   -30.68456   19.21694    

3   -30.12345   22.34256    

4   -15.12546   17.12365 

运行完我想要的函数后(没有for循环......)一个df:


     City

0    HappyPlace

1    SadPlace

2    AveragePlace

3    CoolPlace

4    BadPlace

注意:我不需要知道如何进行反向地理编码,这是一个关于将函数应用于整个 df 而无需迭代的问题。


编辑:


使用 df.apply() 可能不起作用,因为我的代码如下所示:


for i in range(len(df)):

    results = g.reverse_geocode(df['LATITUDE'][i], df['LONGITUDE'][i])

    city.append(results.city)


犯罪嫌疑人X
浏览 115回答 1
1回答

萧十郎

较慢的方法遍历地理点列表并获取地理点的城市import pandas as pdimport timed = {'Latitude': [-25.66026,-25.67923,-30.68456,-30.12345,-15.12546,-25.66026,-25.67923,-30.68456,-30.12345,-15.12546], 'Longitude': [28.0914, 28.10525,19.21694,22.34256,17.12365,28.0914, 28.10525,19.21694,22.34256,17.12365]}   df = pd.DataFrame(data=d)# example method of g.reverse_geocode() -> geo_reversedef geo_reverse(lat, long):    time.sleep(2)    #assuming that your reverse_geocode will take 2 second    print(lat, long)for i in range(len(df)):    results = geo_reverse(df['Latitude'][i], df['Longitude'][i])因为time.sleep(2). 上述程序至少需要 20 秒来处理所有十个地理点。比上面更好的方法:import pandas as pdimport timed = {'Latitude': [-25.66026,-25.67923,-30.68456,-30.12345,-15.12546,-25.66026,-25.67923,-30.68456,-30.12345,-15.12546], 'Longitude': [28.0914, 28.10525,19.21694,22.34256,17.12365,28.0914, 28.10525,19.21694,22.34256,17.12365]}   df = pd.DataFrame(data=d)import threadingdef runnable_method(f, args):    result_info = [threading.Event(), None]    def runit():        result_info[1] = f(args)        result_info[0].set()    threading.Thread(target=runit).start()    return result_infodef gather_results(result_infos):    results = []    for i in range(len(result_infos)):        result_infos[i][0].wait()        results.append(result_infos[i][1])    return resultsdef geo_reverse(args):    time.sleep(2)    return "City Name of ("+str(args[0])+","+str(args[1])+")"geo_points = []for i in range(len(df)):    tuple_i = (df['Latitude'][i], df['Longitude'][i])    geo_points.append(tuple_i)result_info = [runnable_method(geo_reverse, geo_point) for geo_point in geo_points]cities_result = gather_results(result_info)  print(cities_result)请注意,该方法的geo_reverse处理时间为 2 秒,以根据地理点获取数据。在第二个示例中,代码只需2 秒即可处理任意数量的点。注意:尝试这两种方法,假设您geo_reverse将花费大约。2秒获取数据。第一种方法将花费 20+1 秒,处理时间将随着输入数量的增加而增加,但第二种方法将具有几乎恒定的处理时间(即大约 2+1)秒,无论您要处理多少个地理点。假设g.reverse_geocode()方法geo_reverse()在上面的代码中。分别运行上面的两个代码(方法)并自行查看差异。说明: 查看上面的代码及其主要部分,即创建元组列表并理解该列表将每个元组传递给动态创建的线程(主要部分):#Converting df of geo points into list of tuplesgeo_points = []for i in range(len(df)):    tuple_i = (df['Latitude'][i], df['Longitude'][i])    geo_points.append(tuple_i)#List comprehension with custom methods and create run-able threadsresult_info = [runnable_method(geo_reverse, geo_point) for geo_point in geo_points]#gather result from each thread.cities_result = gather_results(result_info)  print(cities_result)
随时随地看视频慕课网APP

相关分类

Python
我要回答