r - 在 tibble 上使用 lapply 时出错从双精度转换为逻辑
问题描述
编辑:看起来这是“级联”方法的一个已知问题。在第一次尝试后返回 NA 值的结果不喜欢在后续方法返回纬度/经度时转换为双精度值。
数据:我有一个需要地理编码的地址列表。我正在使用lapply()
拆分应用组合,它有效,但非常缓慢。我拆分(进一步)-apply-combine 的想法是返回关于模糊名称和大小的错误,这让我感到困惑。
# example data
library(dplyr)
library(tidygeocoder)
url <- "https://www.briandunning.com/sample-data/us-500.zip"
download.file(url = url, destfile = basename(url))
adds <- readr::read_csv(basename(url)) %>%
select(address, city,
county, state, zip) %>%
mutate(date = seq.Date(as.Date('2015-01-01'), to = Sys.Date(), length.out = 500)) %>%
mutate(year = lubridate::year(date)) %>%
# to keep it small
sample_n(20)
这可行,按年份拆分地址,应用tidygeocoder
函数返回纬度/经度,然后重新组合。
adds_by_year <- adds %>% split(.$year)
geo_list <- lapply(adds_by_year, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
下面没有:
adds <- adds %>%
mutate(yrmn = zoo::as.yearmon(date))
adds_by_yrm <- adds %>% split(.$yrmn)
geo_list <- lapply(adds_by_yrm, function(x) {
geo <- geocode(.tbl = x,
street = address,
city = city,
county = county,
state = state,
postalcode = zip,
# cascade method uses all options (census, osm, etc)
# takes longer but may be more accurate
method = "cascade", timeout = 500) %>%
filter(!is.na(lat))
return(geo)
})
out <- bind_rows(geo_list)
返回此错误:
Error: Assigned data `retry_results` must be compatible with existing data.
ℹ Error occurred for column `lat`.
x Can't convert from <double> to <logical> due to loss of precision.
* Locations: 1.
Run `rlang::last_error()` to see where the error occurred.
我做了一些搜索并找到了这个,但是建议的解决方案——将 x 包装在中as.data.frame()
,导致了同样的错误。任何见解都值得赞赏。我已经研究过使用purrr
,但我不确定我是否完全了解。
这是完整的回溯,我不太熟悉,无法完全解析:
Backtrace:
█
1. ├─base::lapply(...)
2. │ └─global::FUN(X[[i]], ...)
3. │ └─tidygeocoder::geocode(...)
4. │ ├─base::do.call(geo, geo_args)
5. │ └─(function (address = NULL, street = NULL, city = NULL, county = NULL, ...
6. │ ├─base::do.call(geo_cascade, all_args[!names(all_args) %in% c("method")])
7. │ └─(function (..., cascade_order = c("census", "osm")) ...
8. │ ├─base::`[<-`(...)
9. │ └─tibble:::`[<-.tbl_df`(...)
10. │ └─tibble:::tbl_subassign(x, i, j, value, i_arg, j_arg, substitute(value))
11. │ └─tibble:::tbl_subassign_row(x, i, value, value_arg)
12. │ ├─base::withCallingHandlers(...)
13. │ └─vctrs::`vec_slice<-`(`*tmp*`, i, value = value[[j]])
14. │ └─(function () ...
15. │ └─vctrs:::vec_cast.logical.double(...)
16. │ └─vctrs::maybe_lossy_cast(out, x, to, lossy, x_arg = x_arg, to_arg = to_arg)
17. │ ├─base::withRestarts(...)
18. │ │ └─base:::withOneRestart(expr, restarts[[1L]])
19. │ │ └─base:::doWithOneRestart(return(expr), restart)
20. │ └─vctrs:::stop_lossy_cast(...)
21. │ └─vctrs:::stop_vctrs(...)
22. │ └─rlang::abort(message, class = c(class, "vctrs_error"), ...)
23. │ └─rlang:::signal_abort(cnd)
24. │ └─base::signalCondition(cnd)
25. └─(function (cnd) ...
解决方案
它正在与dplyr
1.0.6
dplyr::bind_rows(geo_list)
# A tibble: 8 x 11
address city county state zip date year yrmn lat long geo_method
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <yearmon> <dbl> <dbl> <chr>
1 134 Lewis Rd Nashville Davidson TN 37211 2016-11-06 2016 Nov 2016 36.2 -86.8 osm
2 6651 Municipal Rd Houma Terrebonne LA 70360 2017-02-03 2017 Feb 2017 29.6 -90.7 osm
3 189 Village Park Rd Crestview Okaloosa FL 32536 2017-08-25 2017 Aug 2017 30.8 -86.6 osm
4 9122 Carpenter Ave New Haven New Haven CT 06511 2018-01-14 2018 Jan 2018 41.5 -72.8 osm
5 5221 Bear Valley Rd Nashville Davidson TN 37211 2018-09-17 2018 Sep 2018 36.1 -86.8 osm
6 28 S 7th St #2824 Englewood Bergen NJ 07631 2020-03-31 2020 Mar 2020 40.9 -74.0 census
7 5 E Truman Rd Abilene Taylor TX 79602 2021-02-25 2021 Feb 2021 32.5 -99.7 osm
8 9 Front St Washington District of Columbia DC 20001 2021-05-16 2021 May 2021 38.9 -77.0 osm
注意到有些list
元素有 0 行。也许,我们可以删除那些 0 行元素,然后使用bind_rows
library(purrr)
library(dplyr)
geo_list %>%
keep(~ NROW(.x) > 0) %>%
bind_rows
# A tibble: 8 x 11
address city county state zip date year yrmn lat long geo_method
<chr> <chr> <chr> <chr> <chr> <date> <dbl> <yearmon> <dbl> <dbl> <chr>
1 134 Lewis Rd Nashville Davidson TN 37211 2016-11-06 2016 Nov 2016 36.2 -86.8 osm
2 6651 Municipal Rd Houma Terrebonne LA 70360 2017-02-03 2017 Feb 2017 29.6 -90.7 osm
3 189 Village Park Rd Crestview Okaloosa FL 32536 2017-08-25 2017 Aug 2017 30.8 -86.6 osm
4 9122 Carpenter Ave New Haven New Haven CT 06511 2018-01-14 2018 Jan 2018 41.5 -72.8 osm
5 5221 Bear Valley Rd Nashville Davidson TN 37211 2018-09-17 2018 Sep 2018 36.1 -86.8 osm
6 28 S 7th St #2824 Englewood Bergen NJ 07631 2020-03-31 2020 Mar 2020 40.9 -74.0 census
7 5 E Truman Rd Abilene Taylor TX 79602 2021-02-25 2021 Feb 2021 32.5 -99.7 osm
8 9 Front St Washington District of Columbia DC 20001 2021-05-16 2021 May 2021 38.9 -77.0 osm
推荐阅读
- python - 使用 Python 的多个属性直接在字典或 json 对象中定位元素
- android - java.lang.RuntimeException:无法创建类 MovieViewModel 的实例,存储库尚未初始化
- python-3.x - python - 张量:访问一个值
- laravel - 在 laravel 5.7 中显示错误
- excel - VLookup 针对 2 个范围 - 一个精确一个非精确
- vba - Outlook VBA 宏循环以未指定的批次移动电子邮件
- ecmascript-6 - `export default` 会阻止我们使用命名参数吗?
- java - Spring Security 登录在本地工作,但不在现场
- javascript - 二维数组的 N-1、N-1 的最短路径
- java - Java selenium remotewebdriver 设置命令超时