r - 数据分组可与 Pandas group-by 和 grouper 相媲美
问题描述
我有分层事件的数据集,其中一个事件有一行。
TIME level1 level2 Occurrence
29/11/2019 00:05 A a 1
29/11/2019 00:05 B a 1
29/11/2019 00:07 B b 1
29/11/2019 00:20 B b 1
29/11/2019 00:05 B c 1
29/11/2019 01:20 A a 1
29/11/2019 01:25 A a 1
29/11/2019 02:00 A a 2
29/11/2019 02:00 B a 1
29/11/2019 02:00 B b 1
29/11/2019 02:35 B b 1
29/11/2019 02:49 B c 1
我将它与 Pandas groupby 和 grouper 聚合以获得如下输出
df_agg = df.groupby([pd.Grouper(freq='H'), 'level1', pd.Grouper('level2')])
df_agg.count()
TIME level1 level2 Count
29/11/2019 00:00 A a 1
B a 1
B b 2
B c 1
29/11/2019 01:00 A a 2
29/11/2019 02:00 A a 2
B a 1
B b 2
B c 1
我可以在 R 中实现类似的东西吗?
我正在附加一个命令来创建类似于我正在工作的数据集
dict = {"TIME" : ['29/11/2019 00:05:00', '29/11/2019 00:05:00', '29/11/2019 00:07:00', '29/11/2019 00:20:00',
'29/11/2019 00:05:00', '29/11/2019 01:20:00', '29/11/2019 01:25:00', '29/11/2019 02:00:00',
'29/11/2019 02:00:00', '29/11/2019 02:00:00', '29/11/2019 02:35:00', '29/11/2019 02:49:00'],
"level1" : ["A", "B", "B", "B", "B", "A", "A", "A", "B","B", "B", "B"],
"level2" : ["a", "a", "b", "b", "c", "a", "a", "a", "a", "b", "b","c"]}
tmp_df = pd.DataFrame(dict)
tmp_df = tmp_df.set_index('TIME')
tmp_df.index = pd.to_datetime(tmp_df.index)
解决方案
我们可以使用dplyr
包:
library(dplyr)
dat %>%
group_by(TIME = format(dat$TIME,format='%d/%m/%Y %H:00:00'), level1, level2) %>%
count(name = "Count")
#> # A tibble: 9 x 4
#> # Groups: TIME, level1, level2 [9]
#> TIME level1 level2 Count
#> <chr> <chr> <chr> <int>
#> 1 29/11/2019 00:00:00 A a 1
#> 2 29/11/2019 00:00:00 B a 1
#> 3 29/11/2019 00:00:00 B b 2
#> 4 29/11/2019 00:00:00 B c 1
#> 5 29/11/2019 01:00:00 A a 2
#> 6 29/11/2019 02:00:00 A a 1
#> 7 29/11/2019 02:00:00 B a 1
#> 8 29/11/2019 02:00:00 B b 2
#> 9 29/11/2019 02:00:00 B c 1
数据:这是我使用的数据。请使用dput(dat)
而不是复制/粘贴来提供您的数据。
structure(list(TIME = structure(c(1574985900, 1574985900, 1574986020,
1574986800, 1574985900, 1574990400, 1574990700, 1574992800, 1574992800,
1574992800, 1574994900, 1574995740), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), level1 = c("A", "B", "B", "B", "B", "A", "A",
"A", "B", "B", "B", "B"), level2 = c("a", "a", "b", "b", "c",
"a", "a", "a", "a", "b", "b", "c"), Occurrence = c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L), spec = structure(list(
cols = list(TIME = structure(list(format = "%d/%m/%Y %H:%M"), class = c("collector_datetime",
"collector")), level1 = structure(list(), class = c("collector_character",
"collector")), level2 = structure(list(), class = c("collector_character",
"collector")), Occurrence = structure(list(), class = c("collector_integer",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1), class = "col_spec"))
推荐阅读
- gridview - Yii2:固定第一列的Gridview
- unit-testing - VSTS - 运行控制台应用程序而不是单元测试
- wordpress - wp_mail() 不会在服务器上发送两个以上的连续消息或通过 mailtrap.io 在本地测试
- python - 如何表示具有多个属性的数据?
- excel - 即使以编程方式滚动到左侧,VBA 冻结窗格功能也会隐藏一些左侧列(意外)
- html - 如何调整多个单行文本框?
- here-api - HERE地图不支持hi-IN?语言支持和映射?
- go - 无法将图像转换为 []uint8
- python - 从 Python 中数据框列中的字符串中删除子目录
- python - Python 包附加名称