首页 > 解决方案 > 总结R中数据框中的重复项

问题描述

我有一个包含以下数据的日期框:

#sample data
Date    <-  c(  "2020-01-01",   "2020-01-01",   "2020-01-01",   "2020-01-01",   "2020-01-01",   "2020-01-02",   "2020-01-02",   "2020-01-02",   "2020-01-02")
Salesperson <-c (   "Sales1",   "Sales1",   "Sales1",   "Sales2",   "Sales2",   "Sales1",   "Sales1",   "Sales2",   "Sales2"    )
Clothing    <-c (   "5",    "2",    "8",    "3",    "3",    "4",    "7",    "3",    "4" )
Electronics <-c (   "6",    "9",    "1",    "2",    "1",    "2",    "2",    "1",    "2" )

data<-data.frame(Date,Salesperson,Clothing,Electronics, stringsAsFactors = FALSE)
data$Date<-as.Date(data$Date,"%Y-%m-%d")

在 df 中有几行,销售人员在同一日期多次记录了他们的销售额,而不是将它们相加。

我想要的结果由下面的数据框显示:

Date    <-  c   (   "2020-01-01",   "2020-01-01",   "2020-01-02",   "2020-01-02"    )
Salesperson <-  c   (   "Sales1",   "Sales2",   "Sales1",   "Sales2")
Clothing    <-  c   (   "15",   "6",    "11",   "7" )
Electronics <-  c   (   "16",   "3",    "4",    "3" )
data1<-data.frame(Date,Salesperson,Clothing,Electronics, stringsAsFactors = FALSE)

有谁知道如何达到这个结果?

标签: r

解决方案


为了总结您的数据,您需要将数字作为数字而不是字符串传递。请参阅我在您的和变量as.numeric()前面添加的内容:ClothingElectronics

Clothing    <-as.numeric(c (   "5",    "2",    "8",    "3",    "3",    "4",    "7",    "3",    "4" ))
Electronics <-as.numeric(c (   "6",    "9",    "1",    "2",    "1",    "2",    "2",    "1",    "2" ))

现在,使用总和进行总结,请尝试:

library(dplyr)
data %>% 
 group_by(Date, Salesperson) %>%
 summarise(sum_cloth=(sum(Clothing)), sum_elec=sum(Electronics))
# Groups:   Date [2]
  Date       Salesperson sum_cloth sum_elec
  <chr>      <chr>           <dbl>    <dbl>
1 2020-01-01 Sales1             15       16
2 2020-01-01 Sales2              6        3
3 2020-01-02 Sales1             11        4
4 2020-01-02 Sales2              7        3

推荐阅读