python-3.x - Python的最小距离点束
问题描述
当我有 10k 点时,下面的代码运行良好(如果速度慢)。我现在有超过 100k 并且需要永远。这个想法是抓住“潜艇”中的所有点,并创建一个按距离排序的点组列表。我卡住的地方是,我不知道如何在创建每个组后不对所有点运行循环。先感谢您。已编辑 - 感谢您的反馈
示例数据:
Latitude Longitude subs
0 36.911907 -119.737828 FT
1 36.885440 -119.766252 FT
2 36.880977 -119.771592 FT
3 36.880570 -119.734737 FT
4 36.878360 -119.757942 FT
5 36.874763 -119.762700 FT
6 36.874557 -119.749833 FT
7 36.873540 -119.754447 FT
8 36.872280 -119.768453 FT
9 36.870497 -119.747387 FT
10 38.057302 -121.358088 QX
11 38.055988 -121.360262 QX
12 38.055808 -121.256427 QX
13 38.054515 -121.347048 QX
14 38.054130 -121.359593 QX
15 38.053633 -121.351737 QX
16 38.052527 -121.340620 QX
17 38.050705 -121.332503 QX
18 38.050264 -121.318113 QX
19 38.048713 -121.388332 QX
这是我的代码:
import pandas as pd
import os
import time
from sklearn.cluster import KMeans
import scipy.spatial as spatial
import sklearn.neighbors as neighbors
import numpy as np
#give starting values to all the points
point_df['bundle']=0
point_df['distance']=-1
#group the lat long together as a tuple
geo_data=list(zip(point_df['Latitude'], point_df['Longitude']))
#looping over the subareas as long as there are fewer than cluster_size units to bundle
start_time = time.time()
cluster_size = 5
bi = 0
for s in point_df['subs'].unique():
s_df = point_df[(point_df['subs']==s) & (point_df['bundle']==0)].reset_index()
while len(s_df)>=cluster_size:
geo_data=list(zip(s_df['Latitude'], s_df['Longitude']))
tree = spatial.KDTree(geo_data)
s_df['neighbors'] = pd.Series(tree.query(geo_data,cluster_size)[1].tolist())
s_df['distance'] = pd.Series([sum(l) for l in tree.query(geo_data,cluster_size)[0].tolist()])
min_distance = s_df.loc[s_df['distance'].idxmin()]#get the least distance record
#add each FO to the bundle that is the least distance
#figure out a name scheme for the bundles
#print(min_distance['neighbors'])
for i in min_distance['neighbors']:
point_df.iloc[s_df['index'][i], point_df.columns.get_loc('bundle')]=f"{s}_{bi}"
point_df.iloc[s_df['index'][i], point_df.columns.get_loc('distance')]=min_distance['distance']
s_df = point_df[(point_df['subs']==s) & (point_df['bundle']==0)].reset_index()
bi+=1
#s_df.drop(s_df.index, inplace=True)
end_time = time.time()
print(end_time -start_time)
解决方案
推荐阅读
- r - 用 R 中的正确日期替换错误的日期
- sitefinity - 访问小部件停止处理
- javascript - Cheerio 选择的现有属性有时会返回未定义(使用 Puppeteer 获取 HTML)
- ruby-on-rails-6 - Rails 6 GlideJS 轮播问题(没有 jquery)
- python - 如何查看已删除列的列表?
- java - InputMismatchException 需要更多输入吗?
- c# - 2D AABB 碰撞解决多角案例问题 C#
- bash - 通过API获取zabbix graph to png
- flutter - 在颤振中使用 showDialog 会引发错误 - “'!_debugLocked': is not true.”
- ms-access - 我无法编译数据库