首页 > 解决方案 > 添加性别栏目识别

问题描述

demo_df <- data_frame(id = c(1,2,3), names = c("Hillary", "Madison", "John"), stock = c(43,5,2), bill = c(43,112,33))

如何在名称栏中使用性别标识?预期输出:

demo_df <- data_frame(id = c(1,2,3), names = c("Hillary", "Madison", "John"), gender = c("female", "female", "male"), stock = c(43,5,2), bill = c(43,112,33))

试过这个

library(gender)
test <- gender_df(demo_df, method = "demo",
           name_col = "name", year_col = c("1900", "2000"))

但我收到此错误

Error in gender_df(demo_df, method = "demo", name_col = "name") : 
  year_col %in% names(data) is not TRUE

标签: r

解决方案


使用gender()而不是gender_df().

请注意,gender()自动按名称按字母顺序对输出进行排序,因此将输出作为新向量简单地添加到 是行不通的demo_df,因为排序可能是错误的。

处理此问题的两个选项:
1.demo_df在调用之前按名称的字母顺序 排序gender()

library(dplyr)

demo_df %>% 
  arrange(names) %>%
  mutate(gender = gender::gender(demo_df$names)$gender)

2. 使用join方法,如dplyr::inner_join,合并列上demo_df调用的结果数据框输出。gender()names

gender_df <- gender::gender(demo_df$names) %>% 
  select(names = name, gender)

inner_join(demo_df, gender_df, by = "names")  

输出:

  id   names stock bill gender
1  1 Hillary    43   43 female
2  2 Madison     5  112 female
3  3    John     2   33   male

所有这些在基础 R 中也是可能的,不包括性别插补部分。我只是更喜欢dplyr


推荐阅读