首页 > 解决方案 > Python的最小距离点束

问题描述

当我有 10k 点时,下面的代码运行良好(如果速度慢)。我现在有超过 100k 并且需要永远。这个想法是抓住“潜艇”中的所有点,并创建一个按距离排序的点组列表。我卡住的地方是,我不知道如何在创建每个组后不对所有点运行循环。先感谢您。已编辑 - 感谢您的反馈

示例数据:

    Latitude    Longitude   subs
0   36.911907   -119.737828 FT
1   36.885440   -119.766252 FT
2   36.880977   -119.771592 FT
3   36.880570   -119.734737 FT
4   36.878360   -119.757942 FT
5   36.874763   -119.762700 FT
6   36.874557   -119.749833 FT
7   36.873540   -119.754447 FT
8   36.872280   -119.768453 FT
9   36.870497   -119.747387 FT
10  38.057302   -121.358088 QX
11  38.055988   -121.360262 QX
12  38.055808   -121.256427 QX
13  38.054515   -121.347048 QX
14  38.054130   -121.359593 QX
15  38.053633   -121.351737 QX
16  38.052527   -121.340620 QX
17  38.050705   -121.332503 QX
18  38.050264   -121.318113 QX
19  38.048713   -121.388332 QX

这是我的代码:

import pandas as pd
import os
import time
from sklearn.cluster import KMeans
import scipy.spatial as spatial
import sklearn.neighbors as neighbors
import numpy as np

#give starting values to all the points
point_df['bundle']=0
point_df['distance']=-1

#group the lat long together as a tuple
geo_data=list(zip(point_df['Latitude'], point_df['Longitude']))

#looping over the subareas as long as there are fewer than cluster_size units to bundle
start_time = time.time()
cluster_size = 5
bi = 0
for s in point_df['subs'].unique():

    s_df = point_df[(point_df['subs']==s) & (point_df['bundle']==0)].reset_index()

    while len(s_df)>=cluster_size:
        geo_data=list(zip(s_df['Latitude'], s_df['Longitude']))

        tree = spatial.KDTree(geo_data)
        
        s_df['neighbors'] = pd.Series(tree.query(geo_data,cluster_size)[1].tolist())
        s_df['distance'] = pd.Series([sum(l) for l in tree.query(geo_data,cluster_size)[0].tolist()])
                                        
        min_distance = s_df.loc[s_df['distance'].idxmin()]#get the least distance record
        #add each FO to the bundle that is the least distance
        #figure out a name scheme for the bundles
        #print(min_distance['neighbors'])
        for i in min_distance['neighbors']:
            point_df.iloc[s_df['index'][i], point_df.columns.get_loc('bundle')]=f"{s}_{bi}"     
            point_df.iloc[s_df['index'][i], point_df.columns.get_loc('distance')]=min_distance['distance']
        s_df = point_df[(point_df['subs']==s) & (point_df['bundle']==0)].reset_index()
        bi+=1
        #s_df.drop(s_df.index, inplace=True)

end_time = time.time()
print(end_time -start_time)

标签: python-3.xcluster-analysis

解决方案


推荐阅读