r - Base R split() 函数导致“unique.default(x, nmax = nmax) 中的错误
问题描述
我正在合作tidycensus
并segregation
进行分析。
为了创建我的数据,我运行这个:
library(tidycensus)
library(tidyverse)
library(segregation)
library(tigris)
library(sf)
los.angeles.indices <- get_acs(
geography = "tract",
variables = c(
white = "B03002_003",
black = "B03002_004",
hispanic = "B03002_012"
),
state = "CA",
geometry = TRUE,
year = 2012
)
california.cities <- get_acs(
geography = "place",
state = "CA",
variables = "B01001_001",
geometry = TRUE,
year = 2012,
survey = "acs1"
) %>%
filter(estimate >= 100000) %>%
transmute(urban_name = str_remove(NAME,
fixed(" city, California")))
ca_city_data <- los.angeles.indices %>%
st_join(california.cities, left = FALSE) %>%
select(-NAME) %>%
st_drop_geometry()
之后,我试图运行这个:
inglewood_entropy <- ca_city_data %>%
filter(urban_name == "Inglewood") %>%
split(~GEOID) %>%
map_dbl(~{
entropy(
data = .x,
group = "variable",
weight = "estimate",
base = 4
)
}) %>%
as_tibble(rownames = "GEOID") %>%
rename(entropy = value)
不幸的是,它导致了这个错误:
Error in unique.default(x, nmax = nmax) :
unique() applies only to vectors
最终结果应如下所示:
inglewood_entropy
#> # A tibble: 50 × 2
#> GEOID entropy
#> <chr> <dbl>
#> 1 06037234902 0.576
#> 2 06037235100 0.453
#> 3 06037235201 0.548
#> 4 06037235202 0.550
#> 5 06037237900 0.357
#> 6 06037238000 0.420
#> 7 06037238100 0.421
#> 8 06037238400 0.430
#> 9 06037276100 0.590
#> 10 06037277100 0.506
#> # … with 40 more rows
真正奇怪的是,我向segregation
包的创建者询问了这个问题,他能够完美地运行代码!
你可以在这里看到我们对他有用的简短讨论:隔离 GitHub 问题
我很确定问题split(~GEOID)
出在代码的一部分上,但我不确定。
无论如何,当我询问包装的创建者时,我正在努力弄清楚为什么这对我不起作用但工作得非常好。而且,由于这不是包裹的问题,我宁愿不要纠缠他。
所以,简而言之:关于如何在不收到错误消息的情况下运行上述代码的任何想法?或者是什么导致错误消息发生在我身上,而不是其他人?
同样,这里是一个简短的可重复的数据示例:
structure(list(GEOID = c("06083002013", "06083002013", "06083002013",
"06083002011", "06083002011", "06083002011", "06061020711", "06061020711",
"06061020711", "06061020712", "06061020712", "06061020712", "06061020805",
"06061020805", "06061020805", "06061020713", "06061020713", "06061020713",
"06083002502", "06083002502", "06083002502", "06061020715", "06061020715",
"06061020715", "06061020714"), variable = c("white", "black",
"hispanic", "white", "black", "hispanic", "white", "black", "hispanic",
"white", "black", "hispanic", "white", "black", "hispanic", "white",
"black", "hispanic", "white", "black", "hispanic", "white", "black",
"hispanic", "white"), estimate = c(2291, 0, 471, 1875, 30, 2720,
3339, 117, 471, 2628, 9, 809, 2887, 11, 571, 2679, 5, 610, 757,
57, 6169, 2532, 20, 223, 3132), moe = c(331, 13, 246, 262, 33,
384, 420, 146, 160, 338, 19, 357, 437, 17, 280, 382, 11, 391,
232, 50, 382, 309, 30, 149, 438), urban_name = c("Santa Maria",
"Santa Maria", "Santa Maria", "Santa Maria", "Santa Maria", "Santa Maria",
"Roseville", "Roseville", "Roseville", "Roseville", "Roseville",
"Roseville", "Roseville", "Roseville", "Roseville", "Roseville",
"Roseville", "Roseville", "Santa Maria", "Santa Maria", "Santa Maria",
"Roseville", "Roseville", "Roseville", "Roseville")), row.names = c("1",
"2", "3", "4", "5", "6", "34", "35", "36", "37", "38", "39",
"73", "74", "75", "76", "77", "78", "115", "116", "117", "127",
"128", "129", "130"), class = "data.frame")
解决方案
根据显示的代码,split
是一个base R
可以使用$
or[[
或提取列的函数with
。由于dput
数据中没有显示“Inglewood”,我们使用"Roseville"
library(dplyr)
ca_city_data %>%
filter(urban_name == "Roseville") %>%
split(.$GEOID)
-输出
$`06061020711`
GEOID variable estimate moe urban_name
34 06061020711 white 3339 420 Roseville
35 06061020711 black 117 146 Roseville
36 06061020711 hispanic 471 160 Roseville
$`06061020712`
GEOID variable estimate moe urban_name
37 06061020712 white 2628 338 Roseville
38 06061020712 black 9 19 Roseville
39 06061020712 hispanic 809 357 Roseville
$`06061020713`
GEOID variable estimate moe urban_name
76 06061020713 white 2679 382 Roseville
77 06061020713 black 5 11 Roseville
78 06061020713 hispanic 610 391 Roseville
$`06061020714`
GEOID variable estimate moe urban_name
130 06061020714 white 3132 438 Roseville
$`06061020715`
GEOID variable estimate moe urban_name
127 06061020715 white 2532 309 Roseville
128 06061020715 black 20 30 Roseville
129 06061020715 hispanic 223 149 Roseville
$`06061020805`
GEOID variable estimate moe urban_name
73 06061020805 white 2887 437 Roseville
74 06061020805 black 11 17 Roseville
75 06061020805 hispanic 571 280 Roseville
使用完整代码
ca_city_data %>%
filter(urban_name == "Roseville") %>%
split(.$GEOID) %>% map_dbl(~{
entropy(
data = .x,
group = "variable",
weight = "estimate",
base = 4
)
}) %>%
as_tibble(rownames = "GEOID") %>%
rename(entropy = value)
-输出
# A tibble: 6 x 2
GEOID entropy
<chr> <dbl>
1 06061020711 0.358
2 06061020712 0.406
3 06061020713 0.354
4 06061020714 0
5 06061020715 0.232
6 06061020805 0.338
它也可以用nest_by
ca_city_data %>%
nest_by(urban_name, GEOID) %>%
transmute(out = entropy(data = data, group = "variable",
weight = "estimate", base = 4)) %>%
ungroup
-输出
# A tibble: 9 x 3
urban_name GEOID out
<chr> <chr> <dbl>
1 Roseville 06061020711 0.358
2 Roseville 06061020712 0.406
3 Roseville 06061020713 0.354
4 Roseville 06061020714 0
5 Roseville 06061020715 0.232
6 Roseville 06061020805 0.338
7 Santa Maria 06083002011 0.513
8 Santa Maria 06083002013 0.329
9 Santa Maria 06083002502 0.281
推荐阅读
- hadoop - 在 GCP 中查找 jar 文件的路径
- javascript - 响应未从 axios 呈现响应获取请求
- python - 如何清理熊猫数据框列中的文本数据
- python - Scala 的密封在 python 中
- javascript - 无法获得单一类别 - NodeJs API
- javascript - 表为空时,Flask Modal 不起作用
- angular - 在Angular 2+中嵌入模态的对话框前面显示mat-select
- tensorflow - 运行 object_detection_tutorial TypeError 的问题:load() 缺少 2 个必需的位置参数
- linux - 工作时sqlmap语法不问任何问题
- python - 查找不包含任何字符 ( . - \ / ) 的列表元素