首页 > 解决方案 > 计数字符并汇总每组的值

问题描述

我似乎可以为我的问题找到合适的代码。我想根据不同的条件创建组并汇总(总和、计数或长度)其他列。

我试过 group_by 并用不同的条件进行总结,但还没有找到任何可行的方法。

我有一个类似的表:

data <- data.frame(Name= c('Anna', 'Anna', 'Anna', 'Anna', 'Anna',
                       'Bella', 'Bella', 'Bella', 'Camilla', 'Camilla'),
               Date= c('1.1.2021', '1.1.2021', '2.1.2021', '3.1.2021', '3.1.2021', 
                       '1.1.2021', '5.1.2021', '5.1.2021', '7.1.2021', '8.1.2021'),
               Item= c('Apple','Pear', 'Zucini','Apple', 'Broccoli',
                       'Apple','Pear','Apple','Apple', 'Tomato'),
               Category= c('Fruit', 'Fruit', 'Vegetable', 'Fruit', 'Vegetable', 
                           'Fruit', 'Fruit', 'Fruit', 'Fruit', 'Vegetable'),
               Weight_kg= c(0.2,0.3,0.5,0.4,1.1,
                            1,0.5,0.8,1.2,0.5)
               )

这将是我想要的输出:

desired_table <- data.frame(Name=c('Anna', 'Bella', 'Camilla'),
Shopping_days=c(3,2,2),
days_fruit=c(2,2,1),
days_vegetables=c(2,0,1),
Total_kg=c(2.5,2.3,1.7),
Fruit_kg=c(0.9,2.3,1.2),
Vegetables=c(1.6,0,0.5))

我尝试了许多与此类似的代码变体,但显然不起作用:

data1 <- data %>%
group_by(Name) %>%
summarize(Shopping_days = length(unique(Date)),
days_fruit = length(unique(Date, Category='Fruit')),
days_vegetables = length(unique(Date, Category='Vegetables')),
Total_kg = sum(Weight_kg),
Fruit_kg = sum(Weight_kg, if Category=Fruit),
Vegetables_kg = sum(Weight_kg, if Category=Vegetables))

任何帮助将非常感激。

标签: rgroup-bysumcountifsummarize

解决方案


使用group_bysummarise

library(dplyr)

data %>%
  group_by(Name) %>%
  summarise(Shopping_days = n_distinct(Date), 
            days_fruit = n_distinct(Item[Category == 'Fruit']), 
            days_vegetables = n_distinct(Item[Category == 'Vegetable']), 
            Total_kg = sum(Weight_kg), 
            Fruit_kg = sum(Weight_kg[Category == 'Fruit']), 
            Vegetables_kg = sum(Weight_kg[Category == 'Vegetable']))

#  Name    Shopping_days days_fruit days_vegetables Total_kg Fruit_kg Vegetables_kg
#  <chr>           <int>      <int>           <int>    <dbl>    <dbl>         <dbl>
#1 Anna                3          2               2      2.5      0.9           1.6
#2 Bella               2          2               0      2.3      2.3           0  
#3 Camilla             2          1               1      1.7      1.2           0.5

推荐阅读