首页 > 解决方案 > 地理编码:找到两组位置之间距离的有效方法

问题描述

我有一组不同个人的位置坐标,以及另一组不同投递箱的坐标,用于他们的选票。我正在尝试查找他们的住所与最近的保管箱之间的距离。我已经附上了我现在必须完成的代码的副本——它是从另一个堆栈溢出示例中复制的。但是,它的效率不是很高,因为我正在使用的数据集是数百万行,并且代码依赖于找到所有可能的坐标组合,然后拉出最小的距离。有没有更有效的方法来处理这个问题?

我目前拥有的:

# Made-Up Data
library(geosphere)
library(tidyverse)
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
                    long=c(-43.17536, -43.17411, -43.36605),
                     lat=c(-22.95414, -22.9302, -23.00133))

geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
                      lat=c(-22.90353, -22.87253, -26,78901))
# Code to find the distance between voters, and the dropoff boxes
# Order into a newdf as needed first.
# First, the voters:  
voter_addresses <- data.frame(voter_id = as.character(geo_voters$voter_id),
                              lon_address = geo_voters$long,
                              lat_address = geo_voters$lat
                              )
# Second, the polling locations: 
polling_address <- data.frame(place_number = 1:nrow(geo_dropoff_boxes),
                       lon_place = geo_dropoff_boxes$long,
                       lat_place = geo_dropoff_boxes$lat
                       )

# Create nested dfs: 
voter_nest <- nest(voter_addresses, -voter_id, .key = 'voter_coords')
polling_nest <- nest(polling_address, -place_number, .key = 'polling_coords')

# Combine for combinations: 
data_master <- crossing(voter_nest, polling_nest)

# Calculate shortest distance: 
shortest_dist <- data_master %>% 
  mutate(dist = map2_dbl(voter_coords, polling_coords, distm)) %>% 
  group_by(voter_id) %>% 
  filter(dist == min(dist)) %>%
  mutate(dist_km = dist/1000,
         voter_id = as.character(voter_id)) %>%
  select(voter_id, dist_km)

标签: rgeocodingdata-cleaningshortest-pathgeosphere

解决方案


sf软件包使这变得简单。该st_as_sf()函数将经纬度值的数据框转换为地理参考点,并st_distance()计算它们之间的距离。运行时st_as_sf(),您需要指定坐标参考系统。看起来您使用的是纬度和经度,所以我指定crs="epsg:4326",这是最常见的纬度/经度参考。

library( sf )

geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
                    long=c(-43.17536, -43.17411, -43.36605),
                     lat=c(-22.95414, -22.9302, -23.00133))

geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
                      lat=c(-22.90353, -22.87253, -26.78901))

# convert the data to sf features
geo_voters = st_as_sf( geo_voters, coords=c('long', 'lat'), crs="epsg:4326" )
geo_dropoff_boxes = st_as_sf( geo_dropoff_boxes, coords=c('long', 'lat'), crs="epsg:4326" )

# calculate the distances between voters and drop boxes
dist = st_distance( geo_voters, geo_dropoff_boxes )
print(dist)

现在每一行代表一个选民,每一列代表他们到投递箱的距离(以米为单位):

Units: [m]
          [,1]     [,2]    [,3]
[1,]  5866.745 18821.87 2482400
[2,]  3461.945 17813.57 2483210
[3,] 20916.618 14641.09 2462186

推荐阅读