r - 如何从数据集中获取最多的不同值
问题描述
我正在玩弄我通过市长办公室网站获得的洛杉矶警察数据。从 2017 年到 2018 年,我试图查看第 5 区议会给出的费用和每项具体费用的金额。CHARGE
这CITY_COUNCIL_DIST
是我正在查看的两个变量/列。
我曾经table(ArrestData$CHARGE)
计算不同值的数量。
我意识到有超过 2400 个唯一条目,因此大部分条目都被省略了。我想知道是否有代码可以查看洛杉矶警察局主要发放的 5 个“收费”。
此外,我试图在一个特定的Council District
(再次,另一个变量/列)中找到前 5 项费用,是否有此代码?
旁白:如何将示例数据添加到我的帖子中?在 RStudio 上执行此操作的步骤是什么?有人在之前的帖子中要求我这样做,但我不知道该怎么做。他们告诉我使用dput(head(df,n))
,但我的数据太大,即使使用 10 行。他们告诉我通过 RScript 来做,但我不确定他们的意思
解决方案
发布对实际数据集/样本数据的引用将有助于创建解决方案。这将有助于帖子遵守其他人提到的可重复性标准。为了这个例子,我们将显式地创建一个数据集。
ArrestData <- data.frame(
CHARGE=c("CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA",
"CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA","CHARGEA",
"CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB",
"CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB","CHARGEB",
"CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC",
"CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC","CHARGEC",
"CHARGED","CHARGED","CHARGED","CHARGED","CHARGED","CHARGED",
"CHARGED","CHARGED","CHARGED","CHARGED","CHARGED","CHARGED",
"CHARGEE","CHARGEE","CHARGEE","CHARGEE","CHARGEE",
"CHARGEE","CHARGEE","CHARGEE","CHARGEE","CHARGEE",
"CHARGEF","CHARGEF","CHARGEF","CHARGEF",
"CHARGEF","CHARGEF","CHARGEF","CHARGEF",
"CHARGEG","CHARGEG","CHARGEG",
"CHARGEG","CHARGEG","CHARGEG",
"CHARGEH","CHARGEH",
"CHARGEH","CHARGEH",
"CHARGEI",
"CHARGEI"
),
CITY_COUNCIL_DIST=c(0,5)
)
假设您的数据集已命名ArrestData
并且您的CHARGE
/CITY_COUNCIL_DIST
也按说明命名,则此代码应该可以工作。下面的代码将包括所有CHARGE
的前 5 名。 CITY_COUNCIL_DIST
CITY_COUNCIL_DIST
#install these packages if you do not have them
install.packages("magrittr")
install.packages("dplyr")
#make sure these libraries are present
library(magrittr)
library(dplyr)
ArrestData %>%
group_by(CHARGE, CITY_COUNCIL_DIST) %>%
summarize(count=n()) %>%
arrange(CITY_COUNCIL_DIST, desc(count)) %>%
group_by(CITY_COUNCIL_DIST) %>%
mutate(rank = rank(desc(count), ties.method="min")) %>%
filter(rank<=5)
为了只过滤掉CITY_COUNCIL_DIST
5 的结果,您需要将filter
语句更改为如下内容:(取决于您的CITY_COUNCIL_DIST
实际值)
filter(rank<=5, CITY_COUNCIL_DIST==5)
推荐阅读
- swift - 应用程序不拦截从 SFSafariViewController 中实例化的通用链接
- rust - 避免依赖项中的动态链接
- rust - 为什么不将指向 u8 的原始指针转换为指向 8 个布尔数组的原始指针打印正确的结果?
- javascript - 当我有 2 个可滑动的反应按钮时。当我滑动按钮 1 和按钮 2 正在移动时
- dokku - Dokku - 持久卷?
- flutter - 我创建了一种使用相机捕获图像的方法,但它不起作用我得到了以下两个错误,
- google-data-studio - 当我没有在下拉控件过滤器中选择任何内容时,如何更改 DataStudio 在图表中的默认显示?
- string - 具有后缀树的最短不可重复子串
- javascript - 单通道中的强模糊效果着色器?
- javascript - 为什么在单击事件侦听器中触发 click() 不会导致无限循环?