首页 > 解决方案 > Using nearest neighbour to find postcode to new postcodes found

问题描述

I have a list of new postcodes and I'm trying to find the nearest postcode from an existing postcode file to attach to the new postcodes. I am using the below code but it seems to have duplicated some rows, please could I have some help resolving this...

My 2 dataframes are:

new_postcode_df which contains 92,590 rows, and columns:

current_postcode_df which contains 1,738,339 rows, and columns:

my desired output is output_df

My code is below:

new_postcode_df_gps    = new_postcode_df[["lat", "long"]].values
current_postcode_df_gps = current_postcode_df[["Latitude", "Longitude"]].values

new_postcode_df_radians     = np.radians(new_postcode_df_gps)
current_postcode_df_radians = np.radians(current_postcode_df_gps)

tree = BallTree(current_postcode_df_radians , leaf_size=15, metric='haversine')

distance, index = tree.query(new_postcode_df_radians, k=1)

earth_radius = 6371000
distance_in_meters = distance * earth_radius
current_postcode_df.Postcode_NS[index[:,0]]

my output is shown in the attached where you can see postcodes beginning with "GY" have been added near the top which should not be the case. Postcodes starting with "AB" should all be at the top.

The new dataframe has increase from 92,590 rows to 92,848 rows

Image of final output dataframe

Libraries I'm using are:

import pandas  as pd
import numpy   as np
from sklearn.neighbors import BallTree

new_postcode_df = pd.DataFrame({"Postcode":["AB101BJ", "AB101BL", "AB107FU"],
                                    "Latitude":[57.146051, 57.148655, 57.119636],
                                    "Longitude":[-2.107375, -2.097433, -2.147906]})

current_postcode_df = pd.DataFrame({"Postcode":["AB101AB", "AB101AF", "AB101AG"],
                                    "Latitude":[57.149606, 57.148707, 57.149051],
                                    "Longitude":[-2.096916, -2.097806, -2.097004]})

output_df = pd.DataFrame({"Postcode":["AB101RS", "AB129TS", "GY35HG"]})

标签: nearest-neighbor

解决方案


推荐阅读