r - 创建月份 id 列
问题描述
我有一个当前看起来像这样的数据框:
Month Park
<date> <chr>
2019-04-01 Arbour Lake East
2019-07-01 Arbour Lake East
2019-07-01 Arbour Lake East
2019-09-01 Arbour Lake East
2019-09-01 Arbour Lake East
2019-10-01 Arbour Lake East
2020-01-01 Arbour Lake East
2020-01-01 Arbour Lake East
2020-02-01 Arbour Lake East
2020-02-01 Arbour Lake East
2020-03-01 Arbour Lake East
2020-04-01 Arbour Lake East
2020-05-01 Arbour Lake East
2020-11-01 Arbour Lake East
2020-12-01 Arbour Lake East
2021-04-01 Arbour Lake East
2019-09-01 Arbour Lake West
2019-09-01 Arbour Lake West
2019-10-01 Arbour Lake West
2020-05-01 Arbour Lake West
我想创建一个新列,月份 id,其中 1 是在特定公园中发现的第一个月,而 2 是在同一公园中的第二个月(独立于这些月是否实际上是连续的。例如,1可能是 9 月,因为它是 Nosehill 砾石坑公园的第一个月,2 可能是 11 月,因为它是 Nosehill 砾石坑公园的第二个月)。一些 id (1, 2, 3, ...) 在不同的公园中是相同的,因为它们只代表公园的第一个月。同一公园内完全相同的月份(月/年)也将收到相同的 ID。
这是我希望该列看起来的样子:
Month Month_id Park
2019-04-01 01 Arbour Lake East
2019-07-01 02 Arbour Lake East
2019-07-01 02 Arbour Lake East
2019-09-01 03 Arbour Lake East
2019-09-01 03 Arbour Lake East
2019-10-01 04 Arbour Lake East
2020-01-01 05 Arbour Lake East
2020-01-01 05 Arbour Lake East
2020-02-01 06 Arbour Lake East
2020-02-01 06 Arbour Lake East
2020-03-01 07 Arbour Lake East
2020-04-01 08 Arbour Lake East
2020-05-01 09 Arbour Lake East
2020-11-01 10 Arbour Lake East
2020-12-01 11 Arbour Lake East
2021-04-01 12 Arbour Lake East
2019-09-01 01 Arbour Lake West
2019-09-01 01 Arbour Lake West
2019-10-01 02 Arbour Lake West
2020-05-01 03 Arbour Lake West
我真的不知道该怎么做,所以任何领导都会非常感激!
更多信息:
> dput(Data.frame[1:4])
structure(list(Month = structure(c(18383, 18383, 18414, 18414,
18444, 18718, 18322, 18687, 18687, 18293, 18293, 18383, 18444,
18475, 18506, 18536, 18567, 18567, 18628, 18748, 18748, 18779,
18809, 18078, 18078, 18109, 18109, 18628, 18628, 18444, 18444,
18475), class = "Date"), Park = c("Aspen Heights", "Aspen Heights",
"Aspen Heights", "Aspen Heights", "Aspen Heights", "Aspen Heights",
"Auburn Bay", "Auburn Bay", "Auburn Bay", "Bayview", "Bayview",
"Bayview", "Bayview", "Bayview", "Bayview", "Bayview", "Bayview",
"Bayview", "Bayview", "Bayview", "Bayview", "Bayview", "Bayview",
"Cranston", "Cranston", "Cranston", "Cranston", "Cranston", "Cranston",
"Currie Barracks", "Currie Barracks", "Currie Barracks"), Aggr_Code = c("1",
"2", "1", "2", "1", "1", "1", "1", "2", "1", "2", "1", "1", "1",
"1", "1", "1", "2", "1", "1", "2", "1", "1", "1", "2", "1", "2",
"1", "2", "1", "2", "1"), AC_events_per_month = c(4, 1, 4, 1,
2, 1, 1, 2, 1, 1, 1, 1, 3, 2, 4, 2, 6, 2, 3, 1, 1, 1, 1, 8, 4,
2, 1, 3, 3, 2, 1, 1)), row.names = c(NA, -32L), groups = structure(list(
Month = structure(c(18078, 18109, 18293, 18322, 18383, 18383,
18414, 18444, 18444, 18444, 18475, 18475, 18506, 18536, 18567,
18628, 18628, 18687, 18718, 18748, 18779, 18809), class = "Date"),
Park = c("Cranston", "Cranston", "Bayview", "Auburn Bay",
"Aspen Heights", "Bayview", "Aspen Heights", "Aspen Heights",
"Bayview", "Currie Barracks", "Bayview", "Currie Barracks",
"Bayview", "Bayview", "Bayview", "Bayview", "Cranston", "Auburn Bay",
"Aspen Heights", "Bayview", "Bayview", "Bayview"), .rows = structure(list(
24:25, 26:27, 10:11, 7L, 1:2, 12L, 3:4, 5L, 13L, 30:31,
14L, 32L, 15L, 16L, 17:18, 19L, 28:29, 8:9, 6L, 20:21,
22L, 23L), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -22L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
解决方案
这是使用libr包中的datastep()
函数的解决方案。
首先,创建您的示例数据:
# Create data
df <- read.table(header = TRUE, text = '
Month Park
2019-04-01 "Arbour Lake East"
2019-07-01 "Arbour Lake East"
2019-07-01 "Arbour Lake East"
2019-09-01 "Arbour Lake East"
2019-09-01 "Arbour Lake East"
2019-10-01 "Arbour Lake East"
2020-01-01 "Arbour Lake East"
2020-01-01 "Arbour Lake East"
2020-02-01 "Arbour Lake East"
2020-02-01 "Arbour Lake East"
2020-03-01 "Arbour Lake East"
2020-04-01 "Arbour Lake East"
2020-05-01 "Arbour Lake East"
2020-11-01 "Arbour Lake East"
2020-12-01 "Arbour Lake East"
2021-04-01 "Arbour Lake East"
2019-09-01 "Arbour Lake West"
2019-09-01 "Arbour Lake West"
2019-10-01 "Arbour Lake West"
2020-05-01 "Arbour Lake West"')
df$Month <- as.Date(df$Month)
其次,生成ID列。datastep 将逐行遍历数据帧。该参数在Month和Parkby
上设置按组。然后,您可以查看更改的Park值,以使用该构造重置每个 Park 的 ID 。data[n. -1, "Park"]
library(libr)
# Perform datastep to calculate id
df2 <- datastep(df, by = c("Month", "Park"),
retain = list(Month_id = 0),
keep = c("Month", "Month_id", "Park"),
{
if (n. > 1) {
if (Park != data[n. - 1, "Park"])
Month_id <- 0
}
if (first.) {
Month_id <- Month_id + 1
}
})
# Add leading zero to id
df2$Month_id <- sprintf("%02d", df2$Month_id)
结果如下:
df2
# Month Month_id Park
# 1 2019-04-01 01 Arbour Lake East
# 2 2019-07-01 02 Arbour Lake East
# 3 2019-07-01 02 Arbour Lake East
# 4 2019-09-01 03 Arbour Lake East
# 5 2019-09-01 03 Arbour Lake East
# 6 2019-10-01 04 Arbour Lake East
# 7 2020-01-01 05 Arbour Lake East
# 8 2020-01-01 05 Arbour Lake East
# 9 2020-02-01 06 Arbour Lake East
# 10 2020-02-01 06 Arbour Lake East
# 11 2020-03-01 07 Arbour Lake East
# 12 2020-04-01 08 Arbour Lake East
# 13 2020-05-01 09 Arbour Lake East
# 14 2020-11-01 10 Arbour Lake East
# 15 2020-12-01 11 Arbour Lake East
# 16 2021-04-01 12 Arbour Lake East
# 17 2019-09-01 01 Arbour Lake West
# 18 2019-09-01 01 Arbour Lake West
# 19 2019-10-01 02 Arbour Lake West
# 20 2020-05-01 03 Arbour Lake West
您也可以使用dplyr执行此操作。但我会把这个留给其他人来回答。
推荐阅读
- pytorch - 训练 DNN 时如何监控 GPU 内存使用情况?
- python - 如何将显示尺寸映射到熊猫数据框列中的标准尺寸?
- python - 如何让我的 Python 脚本在云上运行?
- c++ - 即使构造函数具有可观察到的副作用,是否允许在标准中消除未使用对象的构造?
- java - Android Studio Java转换值应用程序不断崩溃
- c++ - 将子类复制到父类,而父类有其他构造函数
- python - PyQt - 获取网格中单击按钮的行和列
- tensorflow-lite - YOLO 的输出是什么?
- mysql - Grafana:使用字符串类型的 MySQL 表列作为图形的 X 轴
- elasticsearch - Pod 无法卷曲 localhost