r - 复杂的数据框和转置数据
问题描述
我有一个数据框,如下所示:
ID Capital Instal Date1 Date2
2 500 25 a b
2 500 20 a c
2 450 15 a a
2 300 10 a f
2 250 0 a z
4 100 25 b a
4 90 20 b b
4 80 15 b a
4 75 10 b f
4 25 0 b z
我想从中创建一个新data.frame
的,如果那样的Date1=Date2
话,我的新数据框B
将如下所示:
ID Date1 Capital Instal1 Instal2 Instal3 Instal4
2 a 450 15 10 0
4 b 90 20 15 10 0
所以我希望新人data.frame
只考虑之后的数据Date1
并且Date2
是平等的。
解决方案
tidyverse
这是一种tidyverse
方法(dplyr
+ tidyr
):
library(tidyverse)
df2 <- df %>%
group_by(ID) %>%
filter(cumsum(Date1 == Date2) >0) %>%
transmute(Capital=Capital[1],Instal,Date1,colnames = paste0("Instal",seq(n()))) %>%
ungroup %>%
spread(colnames,Instal)
df2[is.na(df2)] <- 0 # omit if you'd rather have NA
# # A tibble: 2 x 7
# ID Capital Date1 Instal1 Instal2 Instal3 Instal4
# * <int> <int> <chr> <int> <int> <int> <int>
# 1 2 450 a 15 10 0 0
# 2 4 90 b 20 15 10 0
该filter
调用删除了之前的行Date1 == Date2
transmute
我们只保留必要的列并创建我们将传播的列名。我们将所有值设置Capital
为第一个,因为它是我们唯一需要的 oe。ID
已分组,因此默认情况下会保留它,并且不允许在transmute
.
然后我们ungroup
和做一本教科书spread
碱基R
在基础 R 中,我们可以使用split
和reshape
遵循相同的想法,最后进行一些繁琐的重新格式化以填充较窄的子数据帧。
df_list <-
lapply(split(df,df$ID),function(x) {
x <- subset(x,cumsum(Date1==Date2)>0)
x <- transform(x, Capital=Capital[1], time = seq(nrow(x)))
reshape(x,idvar=c("ID","Capital","Date1"),direction="wide",sep="",drop="Date2")
})
all_names <- names(df_list[[which.max(lengths(df_list))]])
df_list_full <- lapply(df_list,function(x) {x[setdiff(all_names,names(x))] <- NA;x})
do.call(rbind, df_list_full)
# ID Capital Date1 Instal1 Instal2 Instal3 Instal4
# 2 2 450 a 15 10 0 NA
# 4 4 90 b 20 15 10 0
数据 :
df <- read.table(text = "ID Capital Instal Date1 Date2
2 500 25 a b
2 500 20 a c
2 450 15 a a
2 300 10 a f
2 250 0 a z
4 100 25 b a
4 90 20 b b
4 80 15 b a
4 75 10 b f
4 25 0 b z",h=T,strin=F)
推荐阅读
- javascript - My discord bot send the embed message when someone reply something
- sql-server - How to get Java exception thrown when connection closed and using Tomcat?
- windows - How to create event and listen for event in powershell
- android - Get Android app launch source from Application context
- node.js - Lighthouse tests are failing with Fastify SSR
- node.js - How can i use ACK method on pubsub topic? (@nestjs-google-pubsub)
- swift - 有没有办法简化 Swift 组合类型?
- python - What is the best way to allow string to be one of two constant values?
- javascript - How to order nested array of objects by their createdAt field in firebase?
- ios - 如何监控 AVAudioPlayerNode 播放完毕