首页 > 解决方案 > 迭代两个数据帧以应用函数

问题描述

我有以下两个数据框(缩短):

df1
day Transmitter_ID  Species Lat Lng Date
4   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  13/08/2015
5   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  14/08/2015
6   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  15/08/2015
7   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  16/08/2015
8   A69-1601-27466  Golden perch    -35.5065473 144.4488804 17/08/2015
8   A69-1601-27466  Golden perch    -35.495479100000004 144.45295380000002  17/08/2015
9   A69-1601-27466  Golden perch    -35.5065473 144.4488804 18/08/2015
10  A69-1601-27466  Golden perch    -35.5065473 144.4488804 19/08/2015
11  A69-1601-27466  Golden perch    -35.5065473 144.4488804 20/08/2015
12  A69-1601-27466  Golden perch    -35.5065473 144.4488804 21/08/2015
13  A69-1601-27466  Golden perch    -35.5065473 144.4488804 22/08/2015
14  A69-1601-27466  Golden perch    -35.5065473 144.4488804 23/08/2015
15  A69-1601-27466  Golden perch    -35.5065473 144.4488804 24/08/2015

rivergps_df
Lng Lat River
151.7753278 -32.90526725    HUNTER RIVER
151.77526830000002  -32.90610052    HUNTER RIVER
151.77526830000002  -32.90752299    HUNTER RIVER
151.77526830000002  -32.90758849    HUNTER RIVER
151.775397  -32.90977754    HUNTER RIVER
151.7754468 -32.91062396    HUNTER RIVER
151.775578  -32.91202941    HUNTER RIVER
151.77578799999998  -32.9142797 HUNTER RIVER
151.7758178 -32.91459931    HUNTER RIVER
151.77586340000002  -32.91508789    HUNTER RIVER
151.7764116 -32.91645856    HUNTER RIVER
151.7765776 -32.91687345    HUNTER RIVER
151.77719040000002  -32.91861786    HUNTER RIVER

我还有一个半正弦函数,它采用一对 lat、lng 并返回两对之间的距离

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 6371 # Radius of earth in kilometers. Use 3956 for miles

我想对这两个数据框做的是:

从 df1 获取每个 lng / lat,对于每个点,对来自 rivergps_df 的 lng / lat 的整个范围应用半正弦函数

返回rivergps_df的索引,其中出现haversine函数的最小值

将此 Rivergps_df 索引附加到 df1

所以我的意思是对于 df1 -35.495479100000004, 144.45295380000002 中的第一个点,我想将半正弦函数应用于 lon1, lat1 对 lon2, lat2 其中 lon2, lat2 是rivergps_df中存在的所有点。然后我想找到 hasrsine 函数返回的最小值,将其附加到 df1 并移动到 df1 中的下一个点。

我该怎么做?

标签: pythonpandas

解决方案


一个想法:

  • 定义一个haversin_argmin(lat, lon, df)迭代df(例如for (lat2, lon2) df[['Lat', 'Lon']].iterrows():)并计算并返回argminfor的函数haversine(lat, lon, lat2, lon2)

  • 然后定义另一个函数f,它接受 a row,获取latand lon,调用haversin_argminwith rivergps_df,并返回附加rowargmin作为新字段。

  • 使用pandas.DataFrame.apply到。apply f_df1

阅读文档apply以更好地了解如何定义f以及传递给apply.


推荐阅读