r - 根据截止年份添加元素和列?
问题描述
structure(list(`2005` = c(0L, 0L, 0L, 2L, 1L), `2006` = c(0L,
0L, 0L, 1L, 1L), `2007` = c(1L, 0L, 1L, 0L, 3L), `2008` = c(1L,
0L, 0L, 4L, 3L), `2009` = c(1L, 0L, 0L, 2L, 3L), `2010` = c(0L,
0L, 0L, 5L, 0L), `2011` = c(0L, 0L, 0L, 0L, 1L), `2012` = c(0L,
0L, 0L, 4L, 1L), `2013` = c(1L, 0L, 1L, 0L, 0L), `2014` = c(0L,
0L, 2L, 0L, 9L), `2015` = c(0L, 0L, 1L, 0L, 2L), `2016` = c(0L,
0L, 0L, 0L, 0L), Cutoff = c("2011", "2015", "2015", "2005", "2011"
)), .Names = c("2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013", "2014", "2015", "2016", "Cutoff"), row.names = c(NA,
5L), class = "data.frame")
给定以下数据框。我想在表格中添加 4 列。一列将截止年份之前的元素中的数字相加,一列将截止年份之后的元素中的数字相加。
然后再添加两列,其中一列添加截止前的年/列总数,另一列添加截止后的总年数/列。
截止年份不应包含在相应的行中。
所以决赛桌最终会是这样的:
structure(list(`2005` = c(0L, 0L, 0L, 2L, 1L), `2006` = c(0L,
0L, 0L, 1L, 1L), `2007` = c(1L, 0L, 1L, 0L, 3L), `2008` = c(1L,
0L, 0L, 4L, 3L), `2009` = c(1L, 0L, 0L, 2L, 3L), `2010` = c(0L,
0L, 0L, 5L, 0L), `2011` = c(0L, 0L, 0L, 0L, 1L), `2012` = c(0L,
0L, 0L, 4L, 1L), `2013` = c(1L, 0L, 1L, 0L, 0L), `2014` = c(0L,
0L, 2L, 0L, 9L), `2015` = c(0L, 0L, 1L, 0L, 2L), `2016` = c(0L,
0L, 0L, 0L, 0L), Cutoff = c("2011", "2015", "2015", "2005", "2011"
), Numbers_Before = c(3, 0, 4, 0, 11), Numbers_After = c(1, 0,
0, 16, 12), Years_Before = c(6, 10, 10, 0, 6), Years_After = c(5,
1, 1, 11, 5)), .Names = c("2005", "2006", "2007", "2008", "2009",
"2010", "2011", "2012", "2013", "2014", "2015", "2016", "Cutoff",
"Numbers_Before", "Numbers_After", "Years_Before", "Years_After"
), row.names = c(NA, 5L), class = "data.frame")
解决方案
我发现首先使用melt
将表格整理成整齐的格式然后使用一些 data.table 操作来计算年份或截止年份之前和之后的数字更容易。
library(data.table)
dt = setDT(structure(list(`2005` = c(0L, 0L, 0L, 2L, 1L), `2006` = c(0L,
0L, 0L, 1L, 1L), `2007` = c(1L, 0L, 1L, 0L, 3L), `2008` = c(1L,
0L, 0L, 4L, 3L), `2009` = c(1L, 0L, 0L, 2L, 3L), `2010` = c(0L,
0L, 0L, 5L, 0L), `2011` = c(0L, 0L, 0L, 0L, 1L), `2012` = c(0L,
0L, 0L, 4L, 1L), `2013` = c(1L, 0L, 1L, 0L, 0L), `2014` = c(0L,
0L, 2L, 0L, 9L), `2015` = c(0L, 0L, 1L, 0L, 2L), `2016` = c(0L,
0L, 0L, 0L, 0L), Cutoff = c("2011", "2015", "2015", "2005", "2011"
)), .Names = c("2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013", "2014", "2015", "2016", "Cutoff"), row.names = c(NA,
5L), class = "data.frame"))
dt[, row := rownames(dt)]
dt2 = melt(dt, id.vars = c('Cutoff', 'row'), variable.name = 'Year', variable.factor = F)
dt2[, Numbers_Before := ifelse(Year < Cutoff, value, 0)]
dt2[, Numbers_After := ifelse(Year > Cutoff, value, 0)]
dt2[, Years_Before := ifelse(Year < Cutoff, 1, 0)]
dt2[, Years_After := ifelse(Year > Cutoff, 1, 0)]
dt3 = dt2[, .(Numbers_Before = sum(Numbers_Before), Numbers_After = sum(Numbers_After),
Years_Before = sum(Years_Before), Years_After = sum(Years_After)), by = row]
dt = merge(dt,dt3, by = 'row')
> dt
row 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Cutoff Numbers_Before Numbers_After Years_Before Years_After
1: 1 0 0 1 1 1 0 0 0 1 0 0 0 2011 3 1 6 5
2: 2 0 0 0 0 0 0 0 0 0 0 0 0 2015 0 0 10 1
3: 3 0 0 1 0 0 0 0 0 1 2 1 0 2015 4 0 10 1
4: 4 2 1 0 4 2 5 0 4 0 0 0 0 2005 0 16 0 11
5: 5 1 1 3 3 3 0 1 1 0 9 2 0 2011 11 12 6 5
编辑:这里使用更聪明的数据表语法和 dcast 而不是 ifelses:
dt[, row := rownames(dt)]
dt2 = melt(dt, id.vars = c('Cutoff', 'row'), variable.name = 'Year', variable.factor = F)
dt2 = dt2[Year != Cutoff][, .(Numbers = sum(value), Years = .N), by = .(row, Year > Cutoff, Cutoff)]
dt2 = dcast(dt2, row + Cutoff ~ Year, value.var = c('Numbers', 'Years'), fill = 0)
dt = merge(dt, dt2, by = c('row', 'Cutoff'))
> dt
row Cutoff 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Numbers_FALSE
1: 1 2011 0 0 1 1 1 0 0 0 1 0 0 0 3
2: 2 2015 0 0 0 0 0 0 0 0 0 0 0 0 0
3: 3 2015 0 0 1 0 0 0 0 0 1 2 1 0 4
4: 4 2005 2 1 0 4 2 5 0 4 0 0 0 0 0
5: 5 2011 1 1 3 3 3 0 1 1 0 9 2 0 11
Numbers_TRUE Years_FALSE Years_TRUE
1: 1 6 5
2: 0 10 1
3: 0 10 1
4: 16 0 11
5: 12 6 5
推荐阅读
- java - 当数据库不为空时,我从 findAll Spring 数据和 h2 数据库中获取空列表
- java - 动态方法不匹配概念。它的输出是什么?
- javascript - 如何使用javascript挑战问题在森林中找到并格式化视觉上美观的树木图案
- amazon-web-services - Amazon Athena 错误打开 Hive 拆分 s3 路径并拒绝访问
- python - 在 Python Pandas 的创建函数中使用 .format() 从数据框中选择值时出错?
- python - 在 Telegram Python 中回复消息后如何停止代码?
- prometheus - 是否有任何方法可以获取用户的指标,例如 mem 和 cpu 使用情况?
- javascript - 如何在 Yup 结果中跳过空字符串?
- protocol-buffers - protobuf java 生成代码中的自定义
- c# - 通过 USB 的以太网 (USB NCM) 与 HoloLens2 的连接丢失!USB NCM 带宽有限?