首页 > 解决方案 > 根据字符串名称聚合子集值并计算 R 中该值的百分比

问题描述

我有一个子集。我想聚合爱好列,以便每个爱好都有自己的列名和值。最好的结果类似于第 2 步。但如果您能帮我解决第 1 步,我也会很高兴。这个子集中总共有 25 个值。在第 2 步中,我通过将子集中的计数/ttl 计数除以得到百分比值。例如,playGolf 的百分比是 2/25=8%。(注意:我已经用 NA 替换了空行,并将其保留为一列。)

df:

     country hobby
     <chr>  <chr>
7   Russia  Play Golf
12  Russia  Reading
17  Russia  Reading
20  Russia  Reading
21  Russia  Reading
22  Russia  Cycling
28  Russia  Reading
33  Russia  Reading
35  Russia  Reading
41  Russia  Surfing
48  Russia  NA
61  Russia  Gaming
65  Russia  Reading
70  Russia  Running
74  Russia  Reading
79  Russia  Running
86  Russia  Reading
87  Russia  Gaming
90  Russia  Reading
92  Russia  Prefer not say
95  Russia  Play Golf
96  Russia  Gaming
97  Russia  Reading
98  Russia  Prefer not say
108 Russia  Reading

预期的第 1 步:

country    playGolf    Reading    Cycling   Surfing   Gaming  Running     PreNSay     NA

Russia        2         13          1           1       3        2           2         1

预期的第 2 步:

country  playGolf(%) Reading(%) Cycling(%) Surfing(%) Gaming(%) Running(%) PreNSay(%) NA(%)
 Russia       8         52        4            4           12       8          8        4

在此之后,我将把它与像这样的其他子集结合起来(但我可以通过 myslef 处理它):

country  playGolf(%) Reading(%) Cycling(%) Surfing(%) Gaming(%) Running(%) PreNSay(%) NA(%)

 Russia       8         52        4            4           12       8          8        4
 Poland       12        24        3            5           10       2          5        1
   ..
 etc...

我该怎么做?谢谢!

标签: rdataframe

解决方案


我们可以table用来获取频率计数

table(df1)

-输出

#  hobby
#country  Cycling Gaming Play Golf Prefer not say Reading Running Surfing
#  Russia       1      3         2              2      13       2       1

并获得使用百分比prop.table

round(100 *prop.table(table(df1)))

-输出

#   hobby
#country  Cycling Gaming Play Golf Prefer not say Reading Running Surfing
#  Russia       4     12         8              8      54       8       4

数据

df1 <- structure(list(country = c("Russia", "Russia", "Russia", "Russia", 
"Russia", "Russia", "Russia", "Russia", "Russia", "Russia", "Russia", 
"Russia", "Russia", "Russia", "Russia", "Russia", "Russia", "Russia", 
"Russia", "Russia", "Russia", "Russia", "Russia", "Russia", "Russia"
), hobby = c("Play Golf", "Reading", "Reading", "Reading", "Reading", 
"Cycling", "Reading", "Reading", "Reading", "Surfing", NA, "Gaming", 
"Reading", "Running", "Reading", "Running", "Reading", "Gaming", 
"Reading", "Prefer not say", "Play Golf", "Gaming", "Reading", 
"Prefer not say", "Reading")), class = "data.frame", row.names = c("7", 
"12", "17", "20", "21", "22", "28", "33", "35", "41", "48", "61", 
"65", "70", "74", "79", "86", "87", "90", "92", "95", "96", "97", 
"98", "108"))

推荐阅读