首页 > 解决方案 > Assign values of a new column based on the frequency of a special pattern in dataframe

问题描述

I would like to create another column of a data frame that groups each member in the first column based on the order.

Here is a reproducible demo:

df1=c("Alex","23","ID #:123", "John","26","ID #:564")
df1=data.frame(df1)
library(dplyr)
library(data.table)
df1 %>% mutate(group= ifelse(df1 %like% "ID #:",1,NA ) )

This was the output from the demo:

df1        group
1     Alex    NA
2       23    NA
3 ID #:123     1
4     John    NA
5       26    NA
6 ID #:564     1

This is what I want:

 df1         group
 1     Alex     1
 2       23     1
 3 ID #:123     1
 4     John     2
 5       26     2
 6 ID #:564     2

So I want to have a group column indicates each member in order.

I appreciate in advance for any reply or thoughts!

标签: rdataframedplyr

解决方案


用 first 转移条件,lag然后做一个cumsum

df1 %>% 
    mutate(group= cumsum(lag(df1 %like% "ID #:", default = 1)))

#       df1 group
#1     Alex     1
#2       23     1
#3 ID #:123     1
#4     John     2
#5       26     2
#6 ID #:564     2

细节:

df1 %>% 
    mutate(
        # calculate the condition
        cond = df1 %like% "ID #:", 
        # shift the condition down and fill the first value with 1
        lag_cond = lag(cond, default = 1),
        # increase the group when the condition is TRUE (ID encountered)
        group= cumsum(lag_cond))

#       df1  cond lag_cond group
#1     Alex FALSE     TRUE     1
#2       23 FALSE    FALSE     1
#3 ID #:123  TRUE    FALSE     1
#4     John FALSE     TRUE     2
#5       26 FALSE    FALSE     2
#6 ID #:564  TRUE    FALSE     2

推荐阅读