首页 > 解决方案 > 按名称分组并按外观排名并添加计数,同时消除每个州内前 2 名中未关联的名称(降序)?

问题描述

例如,我有一个看起来像这样的数据集

    name |  state
   Smith      NY
 Anthony      CA
   James      MA
   Henry      CA
 Andrews      NY
   Helen      CA
   Smith      NY
   Smith      NY
 Anthony      CA
 Andrews      NY
 Richard      MA
 Richard      MA
 Richard      MA
 Anthony      CA
  Smith       MA
 Jeffries     CA
 Conrad       NY
  Hanes       NY
  James       MA
  Conrad      NY
  Conrad      NY
  Helen       CA

最后我想要这样的东西。请注意,状态是按字母顺序排列的。请注意,出现次数最多的名称显示在顶部,其次出现的名称紧随其后。我只选择每个分组(状态)中的前两个,然后我创建这些列引用它们的排名和基于行外观的计数。

  name|   state| Rank | Count 
Anthony     CA     1        3
Anthony     CA     1        3
Anthony     CA     1        3
 Helen      CA     2        2
 Helen      CA     2        2
Richard     MA     1        3
Richard     MA     1        3
Richard     MA     1        3
  James     MA     2        2
  James     MA     2        2
Smith       NY     1        3
Smith       NY     1        3
Smith       NY     1        3
Conrad      NY     1        3
Conrad      NY     1        3
Conrad      NY     1        3

标签: rlistcountrank

解决方案


也许这有帮助

library(dplyr)
df1 %>%
   add_count(name, state) %>% 
   group_by(state) %>%
   mutate(Rank = dense_rank(-n)) %>% 
   arrange(state, Rank) %>% 
   filter(Rank %in% 1:2)
# A tibble: 18 x 4
# Groups:   state [3]
   name    state     n  Rank
   <chr>   <chr> <int> <int>
 1 Anthony CA        3     1
 2 Anthony CA        3     1
 3 Anthony CA        3     1
 4 Helen   CA        2     2
 5 Helen   CA        2     2
 6 Richard MA        3     1
 7 Richard MA        3     1
 8 Richard MA        3     1
 9 James   MA        2     2
10 James   MA        2     2
11 Smith   NY        3     1
12 Smith   NY        3     1
13 Smith   NY        3     1
14 Conrad  NY        3     1
15 Conrad  NY        3     1
16 Conrad  NY        3     1
17 Andrews NY        2     2
18 Andrews NY        2     2

数据

df1 <- structure(list(name = c("Smith", "Anthony", "James", "Henry", 
"Andrews", "Helen", "Smith", "Smith", "Anthony", "Andrews", "Richard", 
"Richard", "Richard", "Anthony", "Smith", "Jeffries", "Conrad", 
"Hanes", "James", "Conrad", "Conrad", "Helen"), state = c("NY", 
"CA", "MA", "CA", "NY", "CA", "NY", "NY", "CA", "NY", "MA", "MA", 
"MA", "CA", "MA", "CA", "NY", "NY", "MA", "NY", "NY", "CA")),
class = "data.frame", row.names = c(NA, 
-22L))

推荐阅读