首页 > 解决方案 > 计算几个列案例的出现次数

问题描述

我有一个数据框:

ID   Date            col1  col2  
1    1606807860      LOY    A
2    1606807860      LOY    B
2    1606807860      LOY    B
3    1606807860      LOY    B
1    1606807860      LOY    A

我想根据 ID、Date、col1 和 col2 计算唯一值的出现次数。所以,期望的结果是:

ID     Date              event    count
1    1606807860          loy-a      2
2    1606807860          loy-b      2
3    1606807860          loy-b      1

我怎么能那样做?另外如何将时间戳格式转换为正常格式,而不是 1606807860?如何更改日期类型?让它像年月日?

这适用于只有 col1 和 col2 的情况:

%>%
  mutate(across(c(col1, col2), tolower)) %>%
  count(col1, col2) %>%
  unite(event, col1, col2, sep='-')

标签: rdataframecount

解决方案


基本 R 选项

aggregate(
  n ~ .,
  transform(
    df,
    event = tolower(paste(col1, col2, sep = "-")),
    Date = as.Date(as.POSIXct(Date, origin = "1970-01-01")),
    n = 1,
    col1 = NULL,
    col2 = NULL
  ),
  sum
)

这使

  ID       Date event n
1  1 2020-12-01 loy-a 2
2  2 2020-12-01 loy-b 2
3  3 2020-12-01 loy-b 1

一个data.table选项

setDT(df)
df[, Date := as.Date(as.POSIXct(Date, origin = "1970-01-01"))][, .(event = tolower(paste(col1, col2, sep = "-")), n = .N), by = names(df)][, c("col1", "col2") := NULL][]

这使

   ID       Date event n
1:  1 2020-12-01 loy-a 2
2:  2 2020-12-01 loy-b 2
3:  3 2020-12-01 loy-b 1

数据

> dput(df)
structure(list(ID = c(1L, 2L, 2L, 3L, 1L), Date = c(1606807860L,
1606807860L, 1606807860L, 1606807860L, 1606807860L), col1 = c("LOY",
"LOY", "LOY", "LOY", "LOY"), col2 = c("A", "B", "B", "B", "A"
)), class = "data.frame", row.names = c(NA, -5L))

推荐阅读