首页 > 解决方案 > 根据条件创建新列

问题描述

我有一个以下格式的数据框

如果用户购买了一件新商品,他将获得唯一的id价值,如果同一用户购买了另一件商品,则child该列具有上一个id

df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c("",'s123','s1004',""))

> df
     id child
1  s123      
2 s1004  s123
3 s1009  s1004
4 s1010      

现在我想创建新列parent并具有初始 id 值

expect_df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c("",'s123','s1004',""),parent = c('s123','s123','s123','s1010'))

> expect_df

     id child parent
1  s123         s123
2 s1004  s123   s123
3 s1009 s1004   s123
4 s1010        s1010

标签: rdplyrdata.table

解决方案


数据:(确保您的输入条目是characters不是 factors,确保您""NA

df <- data.frame(id= c('s123','s1004','s1009','s1010'),child = c(NA,'s123','s1004',NA),stringsAsFactors = F)

代码:

df$parent <- NA

repeat {
    sid <- df$id[which(is.na(df$parent))[1]]

    df$parent[apply(df,1,function(x){x<-na.omit(x);if(any(x%in%sid)){sid<<-c(sid,x);T;}else{F}})] <- sid[1]

    if (all(!is.na(df$parent))) break
}

结果:

#      id child parent
# 1  s123  <NA>   s123
# 2 s1004  s123   s123
# 3 s1009 s1004   s123
# 4 s1010  <NA>  s1010

推荐阅读