r - 根据字符串名称聚合子集值并计算 R 中该值的百分比
问题描述
我有一个子集。我想聚合爱好列,以便每个爱好都有自己的列名和值。最好的结果类似于第 2 步。但如果您能帮我解决第 1 步,我也会很高兴。这个子集中总共有 25 个值。在第 2 步中,我通过将子集中的计数/ttl 计数除以得到百分比值。例如,playGolf 的百分比是 2/25=8%。(注意:我已经用 NA 替换了空行,并将其保留为一列。)
df:
country hobby
<chr> <chr>
7 Russia Play Golf
12 Russia Reading
17 Russia Reading
20 Russia Reading
21 Russia Reading
22 Russia Cycling
28 Russia Reading
33 Russia Reading
35 Russia Reading
41 Russia Surfing
48 Russia NA
61 Russia Gaming
65 Russia Reading
70 Russia Running
74 Russia Reading
79 Russia Running
86 Russia Reading
87 Russia Gaming
90 Russia Reading
92 Russia Prefer not say
95 Russia Play Golf
96 Russia Gaming
97 Russia Reading
98 Russia Prefer not say
108 Russia Reading
预期的第 1 步:
country playGolf Reading Cycling Surfing Gaming Running PreNSay NA
Russia 2 13 1 1 3 2 2 1
预期的第 2 步:
country playGolf(%) Reading(%) Cycling(%) Surfing(%) Gaming(%) Running(%) PreNSay(%) NA(%)
Russia 8 52 4 4 12 8 8 4
在此之后,我将把它与像这样的其他子集结合起来(但我可以通过 myslef 处理它):
country playGolf(%) Reading(%) Cycling(%) Surfing(%) Gaming(%) Running(%) PreNSay(%) NA(%)
Russia 8 52 4 4 12 8 8 4
Poland 12 24 3 5 10 2 5 1
..
etc...
我该怎么做?谢谢!
解决方案
我们可以table
用来获取频率计数
table(df1)
-输出
# hobby
#country Cycling Gaming Play Golf Prefer not say Reading Running Surfing
# Russia 1 3 2 2 13 2 1
并获得使用百分比prop.table
round(100 *prop.table(table(df1)))
-输出
# hobby
#country Cycling Gaming Play Golf Prefer not say Reading Running Surfing
# Russia 4 12 8 8 54 8 4
数据
df1 <- structure(list(country = c("Russia", "Russia", "Russia", "Russia",
"Russia", "Russia", "Russia", "Russia", "Russia", "Russia", "Russia",
"Russia", "Russia", "Russia", "Russia", "Russia", "Russia", "Russia",
"Russia", "Russia", "Russia", "Russia", "Russia", "Russia", "Russia"
), hobby = c("Play Golf", "Reading", "Reading", "Reading", "Reading",
"Cycling", "Reading", "Reading", "Reading", "Surfing", NA, "Gaming",
"Reading", "Running", "Reading", "Running", "Reading", "Gaming",
"Reading", "Prefer not say", "Play Golf", "Gaming", "Reading",
"Prefer not say", "Reading")), class = "data.frame", row.names = c("7",
"12", "17", "20", "21", "22", "28", "33", "35", "41", "48", "61",
"65", "70", "74", "79", "86", "87", "90", "92", "95", "96", "97",
"98", "108"))
推荐阅读
- python - ModuleNotFoundError:没有名为“mlxtend”的模块
- python - 在 Debian 10.6 上安装 Bazzar 时出现问题
- c - 将 double 转换为 float 时会发生什么?
- bash - Ubuntu AndroidStudio颤动:未设置JAVA_HOME
- perl - 用于确定 16:00 后系统上仍有多少登录的 Perl 脚本
- javascript - 属性上的悬停效果
- java - 无法在 BroadcastReceiver 中添加窗口
- python - Python:如何打印字典值
- apache-spark - Apache Spark Partitioning Data Using a SQL Function nTile
- api - OpenAPI 与 JSON:API