首页 > 解决方案 > 在 tibble 上使用 lapply 时出错从双精度转换为逻辑

问题描述

编辑:看起来这是“级联”方法的一个已知问题。在第一次尝试后返回 NA 值的结果不喜欢在后续方法返回纬度/经度时转换为双精度值。

数据:我有一个需要地理编码的地址列表。我正在使用lapply()拆分应用组合,它有效,但非常缓慢。我拆分(进一步)-apply-combine 的想法是返回关于模糊名称和大小的错误,这让我感到困惑。

# example data 
library(dplyr)
library(tidygeocoder)

url <- "https://www.briandunning.com/sample-data/us-500.zip"
download.file(url = url, destfile = basename(url))

adds <- readr::read_csv(basename(url)) %>%
  select(address, city, 
         county, state, zip) %>%
  mutate(date = seq.Date(as.Date('2015-01-01'), to = Sys.Date(), length.out = 500)) %>%
  mutate(year = lubridate::year(date)) %>%
  # to keep it small 
  sample_n(20)

这可行,按年份拆分地址,应用tidygeocoder函数返回纬度/经度,然后重新组合。

adds_by_year <- adds %>% split(.$year)
geo_list <- lapply(adds_by_year, function(x) {
  geo <-  geocode(.tbl = x,
                      street = address,
                      city = city,
                      county = county,
                      state = state,
                      postalcode = zip,
                      # cascade method uses all options (census, osm, etc)
                      # takes longer but may be more accurate
                      method = "cascade", timeout = 500) %>%
    filter(!is.na(lat))
  return(geo)
})

out <- bind_rows(geo_list)

下面没有:

adds <- adds %>%
  mutate(yrmn = zoo::as.yearmon(date))

adds_by_yrm <- adds %>% split(.$yrmn)
geo_list <- lapply(adds_by_yrm, function(x) {
  geo <-  geocode(.tbl = x,
                  street = address,
                  city = city,
                  county = county,
                  state = state,
                  postalcode = zip,
                  # cascade method uses all options (census, osm, etc)
                  # takes longer but may be more accurate
                  method = "cascade", timeout = 500) %>%
    filter(!is.na(lat))
  return(geo)
})

out <- bind_rows(geo_list)

返回此错误:

 Error: Assigned data `retry_results` must be compatible with existing data.
ℹ Error occurred for column `lat`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
 

我做了一些搜索并找到了这个,但是建议的解决方案——将 x 包装在中as.data.frame(),导致了同样的错误。任何见解都值得赞赏。我已经研究过使用purrr,但我不确定我是否完全了解。

这是完整的回溯,我不太熟悉,无法完全解析:

Backtrace:
     █
  1. ├─base::lapply(...)
  2. │ └─global::FUN(X[[i]], ...)
  3. │   └─tidygeocoder::geocode(...)
  4. │     ├─base::do.call(geo, geo_args)
  5. │     └─(function (address = NULL, street = NULL, city = NULL, county = NULL, ...
  6. │       ├─base::do.call(geo_cascade, all_args[!names(all_args) %in% c("method")])
  7. │       └─(function (..., cascade_order = c("census", "osm")) ...
  8. │         ├─base::`[<-`(...)
  9. │         └─tibble:::`[<-.tbl_df`(...)
 10. │           └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
 11. │             └─tibble:::tbl_subassign_row(x, i, value, value_arg)
 12. │               ├─base::withCallingHandlers(...)
 13. │               └─vctrs::`vec_slice<-`(`*tmp*`, i, value = value[[j]])
 14. │                 └─(function () ...
 15. │                   └─vctrs:::vec_cast.logical.double(...)
 16. │                     └─vctrs::maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg)
 17. │                       ├─base::withRestarts(...)
 18. │                       │ └─base:::withOneRestart(expr, restarts[[1L]])
 19. │                       │   └─base:::doWithOneRestart(return(expr), restart)
 20. │                       └─vctrs:::stop_lossy_cast(...)
 21. │                         └─vctrs:::stop_vctrs(...)
 22. │                           └─rlang::abort(message, class = c(class, "vctrs_error"), ...)
 23. │                             └─rlang:::signal_abort(cnd)
 24. │                               └─base::signalCondition(cnd)
 25. └─(function (cnd) ...

标签: rgeocoding

解决方案


它正在与dplyr 1.0.6

dplyr::bind_rows(geo_list)
# A tibble: 8 x 11
  address             city       county               state zip   date        year yrmn        lat  long geo_method
  <chr>               <chr>      <chr>                <chr> <chr> <date>     <dbl> <yearmon> <dbl> <dbl> <chr>     
1 134 Lewis Rd        Nashville  Davidson             TN    37211 2016-11-06  2016 Nov 2016   36.2 -86.8 osm       
2 6651 Municipal Rd   Houma      Terrebonne           LA    70360 2017-02-03  2017 Feb 2017   29.6 -90.7 osm       
3 189 Village Park Rd Crestview  Okaloosa             FL    32536 2017-08-25  2017 Aug 2017   30.8 -86.6 osm       
4 9122 Carpenter Ave  New Haven  New Haven            CT    06511 2018-01-14  2018 Jan 2018   41.5 -72.8 osm       
5 5221 Bear Valley Rd Nashville  Davidson             TN    37211 2018-09-17  2018 Sep 2018   36.1 -86.8 osm       
6 28 S 7th St #2824   Englewood  Bergen               NJ    07631 2020-03-31  2020 Mar 2020   40.9 -74.0 census    
7 5 E Truman Rd       Abilene    Taylor               TX    79602 2021-02-25  2021 Feb 2021   32.5 -99.7 osm       
8 9 Front St          Washington District of Columbia DC    20001 2021-05-16  2021 May 2021   38.9 -77.0 osm   

注意到有些list元素有 0 行。也许,我们可以删除那些 0 行元素,然后使用bind_rows

library(purrr)
library(dplyr)
geo_list %>%
    keep(~ NROW(.x) > 0) %>% 
    bind_rows
# A tibble: 8 x 11
  address             city       county               state zip   date        year yrmn        lat  long geo_method
  <chr>               <chr>      <chr>                <chr> <chr> <date>     <dbl> <yearmon> <dbl> <dbl> <chr>     
1 134 Lewis Rd        Nashville  Davidson             TN    37211 2016-11-06  2016 Nov 2016   36.2 -86.8 osm       
2 6651 Municipal Rd   Houma      Terrebonne           LA    70360 2017-02-03  2017 Feb 2017   29.6 -90.7 osm       
3 189 Village Park Rd Crestview  Okaloosa             FL    32536 2017-08-25  2017 Aug 2017   30.8 -86.6 osm       
4 9122 Carpenter Ave  New Haven  New Haven            CT    06511 2018-01-14  2018 Jan 2018   41.5 -72.8 osm       
5 5221 Bear Valley Rd Nashville  Davidson             TN    37211 2018-09-17  2018 Sep 2018   36.1 -86.8 osm       
6 28 S 7th St #2824   Englewood  Bergen               NJ    07631 2020-03-31  2020 Mar 2020   40.9 -74.0 census    
7 5 E Truman Rd       Abilene    Taylor               TX    79602 2021-02-25  2021 Feb 2021   32.5 -99.7 osm       
8 9 Front St          Washington District of Columbia DC    20001 2021-05-16  2021 May 2021   38.9 -77.0 osm       

推荐阅读