r - 按名称分组并按外观排名并添加计数,同时消除每个州内前 2 名中未关联的名称(降序)?
问题描述
例如,我有一个看起来像这样的数据集
name | state
Smith NY
Anthony CA
James MA
Henry CA
Andrews NY
Helen CA
Smith NY
Smith NY
Anthony CA
Andrews NY
Richard MA
Richard MA
Richard MA
Anthony CA
Smith MA
Jeffries CA
Conrad NY
Hanes NY
James MA
Conrad NY
Conrad NY
Helen CA
最后我想要这样的东西。请注意,状态是按字母顺序排列的。请注意,出现次数最多的名称显示在顶部,其次出现的名称紧随其后。我只选择每个分组(状态)中的前两个,然后我创建这些列引用它们的排名和基于行外观的计数。
name| state| Rank | Count
Anthony CA 1 3
Anthony CA 1 3
Anthony CA 1 3
Helen CA 2 2
Helen CA 2 2
Richard MA 1 3
Richard MA 1 3
Richard MA 1 3
James MA 2 2
James MA 2 2
Smith NY 1 3
Smith NY 1 3
Smith NY 1 3
Conrad NY 1 3
Conrad NY 1 3
Conrad NY 1 3
解决方案
也许这有帮助
library(dplyr)
df1 %>%
add_count(name, state) %>%
group_by(state) %>%
mutate(Rank = dense_rank(-n)) %>%
arrange(state, Rank) %>%
filter(Rank %in% 1:2)
# A tibble: 18 x 4
# Groups: state [3]
name state n Rank
<chr> <chr> <int> <int>
1 Anthony CA 3 1
2 Anthony CA 3 1
3 Anthony CA 3 1
4 Helen CA 2 2
5 Helen CA 2 2
6 Richard MA 3 1
7 Richard MA 3 1
8 Richard MA 3 1
9 James MA 2 2
10 James MA 2 2
11 Smith NY 3 1
12 Smith NY 3 1
13 Smith NY 3 1
14 Conrad NY 3 1
15 Conrad NY 3 1
16 Conrad NY 3 1
17 Andrews NY 2 2
18 Andrews NY 2 2
数据
df1 <- structure(list(name = c("Smith", "Anthony", "James", "Henry",
"Andrews", "Helen", "Smith", "Smith", "Anthony", "Andrews", "Richard",
"Richard", "Richard", "Anthony", "Smith", "Jeffries", "Conrad",
"Hanes", "James", "Conrad", "Conrad", "Helen"), state = c("NY",
"CA", "MA", "CA", "NY", "CA", "NY", "NY", "CA", "NY", "MA", "MA",
"MA", "CA", "MA", "CA", "NY", "NY", "MA", "NY", "NY", "CA")),
class = "data.frame", row.names = c(NA,
-22L))
推荐阅读
- r - R中套索、弹性网和岭回归的不同惩罚函数
- r - 在 R 中使用 sort() 或 order() 对因子进行排序
- python - 找到 max() 后找不到 min(),反之亦然
- java - 如何列出文件夹中以“.dat”结尾的所有文件以及最近的更改(日期时间)
- bash - Unix shell脚本中是否可以进行异常处理,包括在内部调用另一个脚本
- css - 如何扩大输入书写区域?
- android-studio - 您好,有人在您的 Flutter 项目中成功添加了蓝牙 CPCL 吗?
- r - 如何制作具有两个显示比例的分类变量的分组条形图?
- python - 过滤包含非 ascii 值的 pandas 数据帧行
- python - datetime.strptime 给了我错误的月份