首页 > 解决方案 > 如何汇总一列并计算不同的值?

问题描述

我有一个这样的数据框

Time       Name    Vote
20100102   Bob     Positive
20100104   Carlos  Negative
20100106   Kinder  Negative
20100106   Tony    Positive
.
.
.

我想按年份对数据进行分组,并找出不同年份的赞成票数和反对票数。
预期结果是:

Year    Positive    Negative
2010       1201        891
2011       2039        189
.
.

使用的代码:

vote_year <- infile %>%
  group_by(Year = cut(Time,breaks = seq(20100100,20210100,by=10000))) %>%
  summarise(Positive = n(Vote == 'Positive'),Nagative = n(Vote == 'Negative')) %>%
  mutate(Year = seq(2010,2020))

我相信问题出在 summarise 命令上,但我不知道如何解决。似乎 group_by 函数也没有创建适当的数据框。

标签: rdataframe

解决方案


碱基

table(lubridate::year(lubridate::ymd(df1$Time)), df1$Vote)

#OR

table(substr(df1$Time, 1, 4), df1$Vote)

       Negative Positive
  2010        2        2
  2011        2        0

library(janitor)也有帮助

df1 <- read.table(header = TRUE, text = "
                  Time       Name    Vote
20100102   Bob     Positive
20100104   Carlos  Negative
20100106   Kinder  Negative
20100106   Tony    Positive
20110104   Carlo   Negative
20110106   Walt    Negative                  ")

library(lubridate)

library(janitor)

library(dplyr)


df1 %>% mutate(year = year(ymd(Time))) %>%
  tabyl(year, Vote)
#>  year Negative Positive
#>  2010        2        2
#>  2011        2        0

Janitor更有帮助,因为您可以像这样制作更多有用的汇总表

df1 %>% mutate(year = year(ymd(Time))) %>%
  tabyl(year, Vote) %>%
  adorn_totals(c('row', 'col')) %>%
  adorn_percentages() %>%
  adorn_pct_formatting(digits = 2) %>%
  adorn_ns("front")

  year    Negative   Positive       Total
  2010 2  (50.00%) 2 (50.00%) 4 (100.00%)
  2011 2 (100.00%) 0  (0.00%) 2 (100.00%)
 Total 4  (66.67%) 2 (33.33%) 6 (100.00%)

推荐阅读