首页 > 解决方案 > 通过调用计算填充数据框中的新列

问题描述

我想计算坐标之间的距离,所以计算比较复杂。我知道我可以用它来创建一个新列

df['distance'] = calculation    # where calculation is the distance formula between coordinates

但是我如何定义一些计算,类似于这个公式,它在计算后递归地填充距离值:

dlon = lon2 - lon1
dlat = lat2 - lat1
a = (sin(dlat/2))^2 + cos(lat1) * cos(lat2) * (sin(dlon/2))^2
c = 2 * atan2( sqrt(a), sqrt(1-a) )
d = R * c (where R is the radius of the Earth)

我使用def吗?

示例 df:

 lat1     long1     lat2      long2
34.43432 134.23423  34.42321 128.23244
34.42132 132.23231  32.32321 140.43213

标签: pythondataframe

解决方案


我认为apply功能是您正在寻找的。

def f(lat1, lon1, lat2, lon2):
    lat1 = abs(radians(lat1))
    lat2 = abs(radians(lat2))
    lon1 = abs(radians(lon1))
    lon2 = abs(radians(lon2))
    
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    d = R * c
    return d

df['distance'] = df.apply(lambda x: f(x.lat1, x.long1, x.lat2, x.long2), axis=1)

除了自己构建计算函数,您还可以使用geopy包:

import geopy.distance

df['distance_2'] = df.apply(lambda x: geopy.distance.distance([x.lat1, x.long1], [x.lat2, x.long2]), axis=1)

推荐阅读