首页 > 解决方案 > ICD-10 变量重新编码

问题描述

我应该对数据库进行一些重新编码。它是一个医疗管理数据库(因此是一个大型数据库)。我应该重新编码已编码的诊断(ICD-10)。我以上面数据库中的示例为例。

ID<-(1:15)
Diag<-c("A001","A002","A003","A004","B001","B002","B003",
      "C001","C002","C003","C004","C005","C006","C007","C008")
Age<-round(rnorm(15,25,10))
DATA<-data.frame(ID,Diag,Age)

所以我想:

将所有以“A”和“B”开头的“诊断”形式编码为“疾病 1”。

将 C001 到 C004 的方式编码为“疾病 2”。

将 C005 到 C008 的方式编码为“疾病 3”。

标签: rdplyr

解决方案


我们可以用case_when

library(dplyr)
library(stringr)
DATA %>%
   mutate(new = case_when(str_sub(Diag, 1, 1) %in% c('A', 'B') ~ 
        'Disease 1',
      Diag %in% str_c('C00', 1:4) ~ 'Disease 2', 
      TRUE ~ 'Disease 3'))
#   ID Diag Age       new
#1   1 A001   9 Disease 1
#2   2 A002  37 Disease 1
#3   3 A003  27 Disease 1
#4   4 A004  31 Disease 1
#5   5 B001  22 Disease 1
#6   6 B002  23 Disease 1
#7   7 B003  30 Disease 1
#8   8 C001  38 Disease 2
#9   9 C002  24 Disease 2
#10 10 C003  25 Disease 2
#11 11 C004  33 Disease 2
#12 12 C005  26 Disease 3
#13 13 C006  45 Disease 3
#14 14 C007  20 Disease 3
#15 15 C008  22 Disease 3

推荐阅读