首页 > 解决方案 > Python:如何计算数据框中所有点之间的距离?

问题描述

我有一个这样的数据框

df
       lat           lon    idx
0   42.363427   -71.096072   0
1   42.360000   -71.090000   1
2   42.360000   -71.090000   2
3   42.364733   -71.095312   3
4   42.360000   -71.090000   4

我想计算所有点之间的公里距离。这就是我正在做的

from math import radians, cos, sin, asin, sqrt

def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])

    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    r = 3956 # Radius of earth in miles. Use 6371 for kilometers
    return c * r

这就是我尝试的

RES = []
for i in df.index:
    s1 = df[df.index == i]
    for j in stop.index:
        if j > i:
            s2 = df[df.index == j]
            lon1 = s1.lon.values
            lon2 = s2.lon.values
            lat1 = s1.lat.values
            lat2 = s2.lat.values
            distance = haversine(lon1, lat1, lon2, lat2)
            RES.append([s1.idx, s2.idx, distance, lat1, lon1, lat2, lon2])

我想知道是否有办法避免循环。

就像是

RES = df.apply(heversine(df.lon, df.lat, df.lon[1:], df.lat[1:]))

标签: pythonpandas

解决方案


由于cdist ,您可以构建一个具有所有距离的矩阵:

from scipy.spatial.distance import cdist
distance_matrix = cdist(df.values[:, 0:2], df.values[:, 0:2], 'euclidean') # you may replace euclidiean by another distance metric among the metrics available in the link above

输出 :

[[0.         0.00697234 0.00697234 0.00151104 0.00697234]
 [0.00697234 0.         0.         0.00711468 0.        ]
 [0.00697234 0.         0.         0.00711468 0.        ]
 [0.00151104 0.00711468 0.00711468 0.         0.00711468]
 [0.00697234 0.         0.         0.00711468 0.        ]]

如果您想在数据框中而不是矩阵中获取结果,您可以简单地执行以下操作:

pd.DataFrame(distance_matrix)

推荐阅读