首页 > 解决方案 > 如何更改相对于另一列和组的列

问题描述

我有 2 列

 PERNO      TPURP       loop
 1      Loop trip     1
 1      Loop trip     2
 1      home          2
 1      shopping      2
 2      work          1
 2      Loop trip     2
 2      school        2
 3      Looptrip      1
 4      work          1

对于每个 perno 如果 TPURP== Loop trip 我想在该行之后添加 1 到循环。

对于每个 PERNO,如果 Loop 行程恰好在另一个 Loop 行程的下一行,我们不会将 1 添加到第一个但我们会为第二个。

输出

 PERNO      TPURP       loop
 1      Loop trip     1
 1      Loop trip     2
 1      home          3
 1      shopping      3
 2      work          1
 2      Loop trip     2
 2      school        3
 3      Looptrip      1
 4      work          1

数据

structure(list(PERNO = c(1, 1, 1, 1, 1, 1), TPURP = structure(c(8L, 
1L, 22L, 22L, 9L, 2L), .Label = c("(1) Working at home (for pay)", 
"(2) All other home activities", "(3) Work/Job", "(4) All other activities at work", 
"(5) Attending class", "(6) All other activities at school", 
"(7) Change type of transportation/transfer", "(8) Dropped off passenger", 
"(9) Picked up passenger", "(10) Other, specify - transportation", 
"(11) Work/Business related", "(12) Service Private Vehicle", 
"(13) Routine Shopping", "(14) Shopping for major purchases", 
"(15) Household errands", "(16) Personal Business", "(17) Eat meal outside of home", 
"(18) Health care", "(19) Civic/Religious activities", "(20) Recreation/Entertainment", 
"(21) Visit friends/relative", "(24) Loop trip", "(97) Other, specify"
), class = "factor"), loop = c(1, 1, 2, 2, 2, 2)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))

标签: rdataframe

解决方案


使用dplyr,我们可以在组中最后一次出现 之后group_by PERNO增加 的值。loop"Loop trip"

library(dplyr)

df %>%
  group_by(PERNO) %>%
  mutate(loop1 = ifelse(any(TPURP == "Loop trip") & 
            row_number() > max(which(TPURP == "Loop trip")),loop + 1, loop))

# PERNO TPURP      loop loop1
#  <int> <fct>     <int> <dbl>
#1     1 Loop trip     1     1
#2     1 Loop trip     2     2
#3     1 home          2     3
#4     1 shopping      2     3
#5     2 work          1     1
#6     2 Loop trip     2     2
#7     2 school        2     3
#8     3 Looptrip      1     1
#9     4 work          1     1

如果任何组没有"Loop trip"但可以忽略,这将返回一条警告消息。

数据

df <- structure(list(PERNO = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L), 
TPURP = structure(c(2L, 2L, 1L, 5L, 6L, 2L, 4L, 3L, 6L), .Label = c("home", 
"Loop trip", "Looptrip", "school", "shopping", "work"), class = "factor"), 
loop = c(1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L)), class = "data.frame", 
row.names = c(NA, -9L))

或者我们可以使用grepl/grep来部分匹配而不是 @Sotos 提到的完全匹配。在更新的数据集上,我们可以做

df %>% 
  group_by(PERNO) %>%
  dplyr::mutate(loop1 = ifelse(any(grepl('Loop', TPURP)) & 
     row_number() > max(grep('Loop', TPURP)), loop + 1, loop))

#   PERNO TPURP                          loop loop1
#   <dbl> <fct>                         <dbl> <dbl>
#1     1 (8) Dropped off passenger         1     1
#2     1 (1) Working at home (for pay)     1     1
#3     1 (24) Loop trip                    2     2
#4     1 (24) Loop trip                    2     2
#5     1 (9) Picked up passenger           2     3
#6     1 (2) All other home activities     2     3

推荐阅读