nearest-neighbor - Using nearest neighbour to find postcode to new postcodes found
问题描述
I have a list of new postcodes and I'm trying to find the nearest postcode from an existing postcode file to attach to the new postcodes. I am using the below code but it seems to have duplicated some rows, please could I have some help resolving this...
My 2 dataframes are:
new_postcode_df which contains 92,590 rows, and columns:
- Postcode e.g. "AB101BJ"
- Latitude e.g. 57.146051
- Longitude e.g. -2.107375
current_postcode_df which contains 1,738,339 rows, and columns:
- Postcode e.g. "AB101AB"
- Latitude e.g. 57.149606
- Longitude e.g. -2.096916
my desired output is output_df
- new_postcode e.g. "AB101BJ"
- current_postcode e.g. "AB101AB"
My code is below:
new_postcode_df_gps = new_postcode_df[["lat", "long"]].values
current_postcode_df_gps = current_postcode_df[["Latitude", "Longitude"]].values
new_postcode_df_radians = np.radians(new_postcode_df_gps)
current_postcode_df_radians = np.radians(current_postcode_df_gps)
tree = BallTree(current_postcode_df_radians , leaf_size=15, metric='haversine')
distance, index = tree.query(new_postcode_df_radians, k=1)
earth_radius = 6371000
distance_in_meters = distance * earth_radius
current_postcode_df.Postcode_NS[index[:,0]]
my output is shown in the attached where you can see postcodes beginning with "GY" have been added near the top which should not be the case. Postcodes starting with "AB" should all be at the top.
The new dataframe has increase from 92,590 rows to 92,848 rows
Image of final output dataframe
Libraries I'm using are:
import pandas as pd
import numpy as np
from sklearn.neighbors import BallTree
new_postcode_df = pd.DataFrame({"Postcode":["AB101BJ", "AB101BL", "AB107FU"],
"Latitude":[57.146051, 57.148655, 57.119636],
"Longitude":[-2.107375, -2.097433, -2.147906]})
current_postcode_df = pd.DataFrame({"Postcode":["AB101AB", "AB101AF", "AB101AG"],
"Latitude":[57.149606, 57.148707, 57.149051],
"Longitude":[-2.096916, -2.097806, -2.097004]})
output_df = pd.DataFrame({"Postcode":["AB101RS", "AB129TS", "GY35HG"]})
解决方案
推荐阅读
- bash - 如何根据时间条件在bash中拆分日志文件
- php - 如果数组包含“我”,则 PHP CSV 到数组,否则如果
- python - 如何检查未腌制文件中的导入包
- google-apps-script - 如何通过背景颜色在谷歌文档中搜索文本?
- clojurescript - 如何在重新框架中正确地将文本输入的值保存到数据库中?
- javascript - 连接到本地开发服务器时,让 Chromecast 在安全环境中运行
- c++ - 如何将整数数组划分为 N 个分区?
- django - 阻止模板标签并包含 html
- java - Eclipse 无法识别 Eclipse 中的 Javadoc 图像属性
- android - 向地图添加不带图标的标记