首页 > 解决方案 > Base R split() 函数导致“unique.default(x, nmax = nmax) 中的错误

问题描述

我正在合作tidycensussegregation进行分析。

为了创建我的数据,我运行这个:

library(tidycensus)
library(tidyverse)
library(segregation)
library(tigris)
library(sf)

los.angeles.indices <- get_acs(
  geography = "tract",
  variables = c(
    white = "B03002_003",
    black = "B03002_004",
    hispanic = "B03002_012"
  ), 
  state = "CA",
  geometry = TRUE,
  year = 2012
) 

california.cities <- get_acs(
  geography = "place",
  state = "CA",
  variables = "B01001_001",
  geometry = TRUE,
  year = 2012,
  survey = "acs1"
) %>%
  filter(estimate >= 100000) %>%
  transmute(urban_name = str_remove(NAME, 
                                    fixed(" city, California")))

ca_city_data <- los.angeles.indices %>%
  st_join(california.cities, left = FALSE) %>%
  select(-NAME) %>%
  st_drop_geometry()

之后,我试图运行这个:

inglewood_entropy <- ca_city_data %>%
  filter(urban_name == "Inglewood") %>%
  split(~GEOID) %>%
  map_dbl(~{
    entropy(
      data = .x,
      group = "variable",
      weight = "estimate",
      base = 4
    )
  }) %>%
  as_tibble(rownames = "GEOID") %>%
  rename(entropy = value)

不幸的是,它导致了这个错误:

Error in unique.default(x, nmax = nmax) : 
  unique() applies only to vectors

最终结果如下所示:

inglewood_entropy
#> # A tibble: 50 × 2
#>    GEOID       entropy
#>    <chr>         <dbl>
#>  1 06037234902   0.576
#>  2 06037235100   0.453
#>  3 06037235201   0.548
#>  4 06037235202   0.550
#>  5 06037237900   0.357
#>  6 06037238000   0.420
#>  7 06037238100   0.421
#>  8 06037238400   0.430
#>  9 06037276100   0.590
#> 10 06037277100   0.506
#> # … with 40 more rows

真正奇怪的是,我向segregation包的创建者询问了这个问题,他能够完美地运行代码!

你可以在这里看到我们对他有用的简短讨论:隔离 GitHub 问题

我很确定问题split(~GEOID)出在代码的一部分上,但我不确定。

无论如何,当我询问包装的创建者时,我正在努力弄清楚为什么这对我不起作用但工作得非常好。而且,由于这不是包裹的问题,我宁愿不要纠缠他。

所以,简而言之:关于如何在不收到错误消息的情况下运行上述代码的任何想法?或者是什么导致错误消息发生在我身上,而不是其他人?

同样,这里是一个简短的可重复的数据示例:

structure(list(GEOID = c("06083002013", "06083002013", "06083002013", 
"06083002011", "06083002011", "06083002011", "06061020711", "06061020711", 
"06061020711", "06061020712", "06061020712", "06061020712", "06061020805", 
"06061020805", "06061020805", "06061020713", "06061020713", "06061020713", 
"06083002502", "06083002502", "06083002502", "06061020715", "06061020715", 
"06061020715", "06061020714"), variable = c("white", "black", 
"hispanic", "white", "black", "hispanic", "white", "black", "hispanic", 
"white", "black", "hispanic", "white", "black", "hispanic", "white", 
"black", "hispanic", "white", "black", "hispanic", "white", "black", 
"hispanic", "white"), estimate = c(2291, 0, 471, 1875, 30, 2720, 
3339, 117, 471, 2628, 9, 809, 2887, 11, 571, 2679, 5, 610, 757, 
57, 6169, 2532, 20, 223, 3132), moe = c(331, 13, 246, 262, 33, 
384, 420, 146, 160, 338, 19, 357, 437, 17, 280, 382, 11, 391, 
232, 50, 382, 309, 30, 149, 438), urban_name = c("Santa Maria", 
"Santa Maria", "Santa Maria", "Santa Maria", "Santa Maria", "Santa Maria", 
"Roseville", "Roseville", "Roseville", "Roseville", "Roseville", 
"Roseville", "Roseville", "Roseville", "Roseville", "Roseville", 
"Roseville", "Roseville", "Santa Maria", "Santa Maria", "Santa Maria", 
"Roseville", "Roseville", "Roseville", "Roseville")), row.names = c("1", 
"2", "3", "4", "5", "6", "34", "35", "36", "37", "38", "39", 
"73", "74", "75", "76", "77", "78", "115", "116", "117", "127", 
"128", "129", "130"), class = "data.frame")

标签: r

解决方案


根据显示的代码,split是一个base R可以使用$or[[或提取列的函数with。由于dput数据中没有显示“Inglewood”,我们使用"Roseville"

library(dplyr)
ca_city_data %>%
  filter(urban_name == "Roseville") %>%
  split(.$GEOID)

-输出

$`06061020711`
         GEOID variable estimate moe urban_name
34 06061020711    white     3339 420  Roseville
35 06061020711    black      117 146  Roseville
36 06061020711 hispanic      471 160  Roseville

$`06061020712`
         GEOID variable estimate moe urban_name
37 06061020712    white     2628 338  Roseville
38 06061020712    black        9  19  Roseville
39 06061020712 hispanic      809 357  Roseville

$`06061020713`
         GEOID variable estimate moe urban_name
76 06061020713    white     2679 382  Roseville
77 06061020713    black        5  11  Roseville
78 06061020713 hispanic      610 391  Roseville

$`06061020714`
          GEOID variable estimate moe urban_name
130 06061020714    white     3132 438  Roseville

$`06061020715`
          GEOID variable estimate moe urban_name
127 06061020715    white     2532 309  Roseville
128 06061020715    black       20  30  Roseville
129 06061020715 hispanic      223 149  Roseville

$`06061020805`
         GEOID variable estimate moe urban_name
73 06061020805    white     2887 437  Roseville
74 06061020805    black       11  17  Roseville
75 06061020805 hispanic      571 280  Roseville

使用完整代码

 ca_city_data %>%
  filter(urban_name == "Roseville") %>%
  split(.$GEOID) %>% map_dbl(~{
    entropy(
      data = .x,
      group = "variable",
      weight = "estimate",
      base = 4
    )
  }) %>%
  as_tibble(rownames = "GEOID") %>% 
  rename(entropy = value)

-输出

# A tibble: 6 x 2
  GEOID       entropy
  <chr>         <dbl>
1 06061020711   0.358
2 06061020712   0.406
3 06061020713   0.354
4 06061020714   0    
5 06061020715   0.232
6 06061020805   0.338

它也可以用nest_by

ca_city_data %>%
    nest_by(urban_name, GEOID) %>%
    transmute(out = entropy(data = data, group = "variable",
    weight = "estimate", base = 4)) %>%
    ungroup

-输出

# A tibble: 9 x 3
  urban_name  GEOID         out
  <chr>       <chr>       <dbl>
1 Roseville   06061020711 0.358
2 Roseville   06061020712 0.406
3 Roseville   06061020713 0.354
4 Roseville   06061020714 0    
5 Roseville   06061020715 0.232
6 Roseville   06061020805 0.338
7 Santa Maria 06083002011 0.513
8 Santa Maria 06083002013 0.329
9 Santa Maria 06083002502 0.281

推荐阅读