我正在尝试确定患者对特定医疗的依从性,但我编写的函数(使用 apply)仅适用于少于 ~100 行的数据帧。

我有两个相关的数据框,我在此处对其进行了精简以保护患者数据:“建议”,其中包含由唯一患者识别号 (UID) 索引的治疗建议条目

> head(Advice)
# A tibble: 6 x 4
# Groups:   UID [3]
       UID eyepartid Proctype     entereddatetime    
     <dbl> <chr>     <chr>        <dttm>             
1 11556127 1         Retina Laser 2017-06-14 12:54:18
2 11556127 2         Retina Laser 2017-06-14 12:54:18
3  2680380 2         Retina Laser 2017-06-14 10:40:22
4  2680380 1         Retina Laser 2017-06-14 10:40:22
5 11275381 2         Retina Laser 2017-06-14 13:01:04
6 11275381 1         Retina Laser 2017-06-14 13:01:04

和“治疗”,其中包含记录患者何时真正接受建议的治疗的条目,并且还按 UID 进行索引。

# A tibble: 6 x 4
       UID eyepartid lasertype               entereddatetime    
     <dbl>     <dbl> <chr>                   <dttm>             
1 11333944         1 Retina Laser Laser Type 2017-04-21 12:42:49
2 12022346         1 Yellow                  2017-11-01 09:18:42
3 12123496         2 Green                   2017-11-20 16:11:43
4 12291214         1 Yellow                  2017-12-23 10:21:45
5 11005906         2 Yellow                  2017-12-23 13:13:48
6 12341193         2 Green                   2018-01-19 09:12:26

作为一个非常粗略的估计,我想做的第一个分析是查看有多少次患者在医生建议治疗后的 30 天内来访(因为大多数建议需要 3 次治疗)。


Advice$treatments <- apply(Advice, 1, 
function(x) {

  # get date of the advice entry
  AdvisedDay <- x["entereddatetime"]

  # take the subset of Treatment that has the correct UID and is within 30 days 
  ## of the advice entry
  TreatSubset <- filter(UID_Treatment, UID == x["UID"], 
  (difftime(Treatment$entereddatetime, AdvisedDay, units = "days") <= 30)) 

  #return the number of rows in TreatSubset

我正在苦苦挣扎的是,当我head(Advice) 在 Advice 数据帧的任何切片 < 100 行上调用它时,该算法工作得很好,但是当我在整个 Advice 数据帧上调用它时,每行都返回零。


adviceToy <- Advice[1:10, ]


# A tibble: 10 x 5
# Groups:   UID [7]
        UID eyepartid Proctype     entereddatetime     treatments
      <dbl> <chr>     <chr>        <dttm>                   <int>
 1 11556127 1         Retina Laser 2017-06-14 12:54:18          3
 2 11556127 2         Retina Laser 2017-06-14 12:54:18          3
 3  2680380 2         Retina Laser 2017-06-14 10:40:22          0
 4  2680380 1         Retina Laser 2017-06-14 10:40:22          0
 5 11275381 2         Retina Laser 2017-06-14 13:01:04          1
 6 11275381 1         Retina Laser 2017-06-14 13:01:04          1
 7 11557272 3         Retina Laser 2017-06-14 14:22:53          2
 8 11492720 2         Retina Laser 2017-06-14 13:04:41          2
 9 11030362 3         Retina Laser 2017-06-14 15:27:36          2
10 11244084 3         Retina Laser 2017-06-14 17:06:16          0


*现在在完整的建议数据帧上运行功能* *没有警告消息*

# A tibble: 6 x 5
# Groups:   UID [3]
       UID eyepartid Proctype     entereddatetime     treatments
     <dbl> <chr>     <chr>        <dttm>                   <int>
1 11556127 1         Retina Laser 2017-06-14 12:54:18          0
2 11556127 2         Retina Laser 2017-06-14 12:54:18          0
3  2680380 2         Retina Laser 2017-06-14 10:40:22          0
4  2680380 1         Retina Laser 2017-06-14 10:40:22          0
5 11275381 2         Retina Laser 2017-06-14 13:01:04          0
6 11275381 1         Retina Laser 2017-06-14 13:01:04          0



注意:我已经清理了任何 NA 或 NULL 值的数据

编辑:添加了一些示例代码。此外,我注意到我的 Advised 数据框附加了一些属性 在此处输入图像描述

