python - 使用没有循环的 Pandas 获取坐标距离矩阵
问题描述
我目前正在使用嵌套的 for 循环从两个数据帧 (ref_df
和comp_df
) 获取坐标的距离矩阵,该循环遍历两个数据帧中的行,如下所示。
import geopy.distance
import pandas as pd
ref_df = pd.DataFrame({"grp_id":['M-00353','M-00353','M-00353','M-00538','M-00538','M-00160','M-00160','M-00160',
'M-00509','M-00509','M-00509','M-00509'],"name": ['B1','IIS','IISB I','BK',
'MM - BK','H(SL)','H(PKS SL)','PTH','ASSM 1','PKS SSM','SSM',
'Sukajadi Sawit Mekar 1'],"lat": [0.43462,0.43462,0.43462,1.74887222,1.74887222,-2.6081,
-2.6081,-2.6081, -2.378258,-2.378258,-2.378258,-2.378258],"long":[101.822603,101.822603,101.822603,101.3710944,101.3710944,
104.12525,104.12525,104.12525,112.542356,112.542356,112.542356,112.542356]})
comp_df = pd.DataFrame({"uml_id": ['PO1000000021','PO1000000054','PO1000000058','PO1000000106'],
"mill_name": ['PT IIS-BI','PT MM-BK','HL','PT SSM'],
"Latitude": [0.4344444,0.077043,-2.6081,-2.381111],"Longitude":[101.825,102.030838,104.12525,112.539722]})
matched_coords = []
for row in ref_df.index:
mill_id = ref_df.get_value(row, "grp_id")
mill_lat = ref_df.get_value(row, "lat")
mill_long = ref_df.get_value(row, "long")
for columns in comp_df.index:
gm_id = comp_df.get_value(columns, "uml_id")
gm_lat = comp_df.get_value(columns, "Latitude")
gm_long = comp_df.get_value(columns, "Longitude")
dist = geopy.distance.distance(
(mill_lat, mill_long),
(gm_lat, gm_long)).km
matched_coords.append([
mill_id, mill_lat, mill_long,
gm_id, gm_lat, gm_long, dist
])
# Convert to data frame
mc_df = pd.DataFrame(matched_coords)
mc_df.columns = [
'grp_id', 'grp_lat', 'grp_long',
'match_id', 'match_lat', 'match_long', 'dist'
]
# Pivot to create wide data frame (matrix of distances)
mc_wide_df = mc_df.pivot_table(
values="dist",
index=["grp_id", "grp_lat","grp_long"],
columns="match_id").reset_index()
但是,我想通过apply
在数据帧上创建一个辅助函数来简化流程和代码。我在下面的尝试不起作用。有没有人能帮我弄清楚这里出了什么问题。
# Test apply!
def get_coords_dist(x):
dist = geopy.distance.distance((x['lat'],x['long']),(comp_df['Latitude'],comp_df['Longitude'])).km
return pd.Series({comp_df.iloc[i[2]]['uml_id']: i for i in dist})
mc_df = ref_df.merge(ref_df.sort_values('grp_id').apply(get_coords_dist, axis=1), left_index=True, right_index=True)
解决方案
您正在寻找在两个数据框ref_df
和comp_df
. 一种方法是pd.merge
在虚拟列上。
def distance_km(x, y):
return geopy.distance.distance(x, y).km
# it looks like your coordinates depend only on grp_id
ref_df_dd = ref_df.drop_duplicates(['grp_id', 'lat', 'long'])
# assign a dummy "_" column in both data frames, merge, and drop the dummy
# column afterwards
merged_df = pd.merge(
ref_df_dd.assign(_=1),
comp_df.assign(_=1),
).drop('_', axis=1)
# apply your distance function on (lat, long) tuples in the Cartesian product
merged_df['distance'] = list(
map(distance_km,
merged_df[['lat', 'long']].apply(tuple, 1),
merged_df[['Latitude', 'Longitude']].apply(tuple, 1)))
# pivot table
merged_df.set_index(['grp_id', 'uml_id']).distance.unstack()
此时merged_df
看起来像
uml_id PO1000000021 PO1000000054 PO1000000058 PO1000000106
grp_id
M-00160 422.745678 377.461999 0.000000 936.147322
M-00353 0.267531 45.832819 422.922708 1232.700696
M-00509 1232.642382 1200.904305 936.449658 0.430525
M-00538 153.871840 198.911938 571.009484 1324.234511
这非常接近你想要的。
另一种解决方案(比上述方法更透明且速度快 2 倍)使用itertools.product
.
from itertools import product
# create a data frame by iterating over row pairs in the Cartesian product
merged_df = pd.DataFrame([{
'grp_id': r.grp_id,
'uml_id': c.uml_id,
'distance': distance_km((r.lat, r.long), (c.Latitude, c.Longitude))
} for r, c in product(ref_df_dd.itertuples(), comp_df.itertuples())])
# pivot table
merged_df.set_index(['grp_id', 'uml_id']).distance.unstack()
这与merged_df
上述相同。
推荐阅读
- javascript - 正则表达式 /b 单词边界查找单词不起作用
- c# - 如何在钓鱼游戏unity c#中倒计时直到鱼咬
- c - 调用 fork() 后子进程从哪里开始?
- types - 在 idris 的最终无标签中出现故障编码系统 f omega
- heroku - Heroku,这里什么都没有。添加自定义域后
- css - Bootstrap 4 Carousel:如何堆叠容器项目而不是内联
- node.js - 如何在不包含所有不必要依赖项的情况下为 Lambda/Google Cloud Functions 打包
- ruby-on-rails - 将 rails server 作为守护进程运行时,是否需要将 cache_classes 设置为 true?
- python - 尽管我在类方法中有“自我”,但我不断收到“train() 接受 0 个位置参数但给出了 3 个”
- celery - 为什么花的 HTTP API 获取任务信息不起作用?