首页 > 解决方案 > 更改一行中的值,以 R 中另一行中的值为条件

问题描述

我在不同的时间点对吸烟者和非吸烟者进行分类。如果某人在某个时间点不吸烟,但在前一个时间点吸烟,那么他们的健康状况为“中级”。问题是,每个时间点都有自己的行,所以我需要将一个值输入到以另一行中的值为条件的行中。我该怎么做呢?

这是一个示例数据框。现在我有它让所有不吸烟的人都有“理想”的健康,但这是错误的,因为如果他们以前吸烟,他们应该有“中等”的健康。

x <- data.frame(Participant = c(1,1,2,2,3),
                Time = c(1,2,1,2,1),
                Smoke_T1 = c(1,NA,0,NA,NA),
                Smoke_T2 = c(NA,0,NA,0,NA),
                Health = c("Poor",rep("Ideal",3),NA)
                )

这是目标:

x <- data.frame(Participant = c(1,1,2,2,3),
                Time = c(1,2,1,2,1),
                Smoke_T1 = c(1,NA,0,NA,NA),
                Smoke_T2 = c(NA,0,NA,0,NA),
                Health = c("Poor","Intermediate",rep("Ideal",2),NA)
                )

我努力了:

x2 <- group_by(x,Participant) %>% 
  mutate(Health[x$Time == 2] = case_when(
    x$Smoke_T1[x$Time == 1] == 1 & x$Smoke_T2[x$Time == 2] == 0 ~ "Intermediate"
      ))

它抛出一个错误:

Error: unexpected '=' in:
"x2 <- group_by(x,Participant) %>% 
   mutate(Health[x$Time == "2"] ="

建议的解决方案不必使用 tidyverse。我实际上更熟悉base R,但我也不知道如何以base R 中的另一行为条件。

标签: rconditional-statementsdata-wrangling

解决方案


好的,首先我清理了您的时间点数据以确保清晰:

x <- data.frame(Participant = c(1,1,2,2,3),
                Time = c(1,2,1,2,1),
                Smoke_T1 = c(1,NA,0,NA,NA),
                Smoke_T2 = c(NA,0,NA,0,NA),
                Health = c("Poor",rep("Ideal",3),NA)
                )

x2 <- x %>% 
  # clean time points by mergeing NA in columns
  mutate(Smoker = coalesce(Smoke_T1,Smoke_T2)) %>% 
  # subset data for cleaner look
  select(Participant, Time,Smoker, Health)

接下来是你想要的结果,我认为:

x3 <- x2 %>% 
  group_by(Participant) %>%
  mutate(Health = case_when(
    Time == 1 & Smoker > 0 ~ "Poor",
    # use lag to see previous time point for T2 result
    lag(Smoker == 1) & Time == 2 & Smoker < 1 ~ "Intermediate", 
    Time == 1 & Smoker < 1 ~ "Ideal",
    #if they didnt smoke in T1 but do in T2
    lag(Smoker < 1) & Time == 2 & Smoker > 0 ~ "Intermediate",
    #IF they didnt smoke at T1 or T2
    lag(Smoker < 1) & Time == 2 & Smoker < 1 ~ "Ideal", 
    #if NAs
    Time == 1 | Time == 2 & Smoker == NA ~ "Unknown",
  ))

如果我忘记了 T1/T2 配对的任何组合,只需更新case_when或让我知道,我将编辑答案

输出:

# A tibble: 5 x 4
# Groups:   Participant [3]
  Participant  Time Smoker Health      
        <dbl> <dbl>  <dbl> <chr>       
1           1     1      1 Poor        
2           1     2      0 Intermediate
3           2     1      0 Ideal       
4           2     2      0 Ideal       
5           3     1     NA Unknown   

推荐阅读