首页 > 解决方案 > 使用左表中的多个字段,右表中的一个字段合并R中的表

问题描述

我有两个要合并的数据框。第一个数据帧 (hospital_df) 包含一个病例 ID,然后是诊断 1、2、3、4 等的诊断代码。第二个数据帧是每个唯一诊断代码的描述,将代码与描述匹配:

id <- c(1, 1, 5, 5, 5, 40, 40, 99, 99, 99, 99)
principal_diagnosis <- c('AA', 'CS', 'AA', 'EG', 'SD', 'DF', 'DD', 'AA', 'DF', 'HG', 'SD')
other_diagnosis <- c('AB', 'CF', 'AD', 'AA', 'SS', 'DS', 'DF', 'AG', 'AB', 'FG', 'VV')
other_diagnosis2 <- c('XX', 'DD', 'SQ', 'SD', 'DF', 'CV', 'DA', 'FD', 'SS', 'WV', 'AA')
hospital_df <- data.frame(id,principal_diagnosis, other_diagnosis, other_diagnosis2)

principal_diagnosis_codes <- c('AA', 'CS', 'AA', 'EG', 'SD', 'DF', 'DD', 'AA', 'DF', 'HG', 'SD', 'AB', 'CF', 'AD', 'SS', 'DS', 'AG', 'FG', 'VV', 'XX', 'SQ', 'CV', 'DA', 'FD', 'WV')
length(principal_diagnosis_codes)
description <- c('disease1',
                 'disease2',
                 'disease3',
                 'disease4',
                 'disease5',
                 'disease6',
                 'disease7',
                 'disease8',
                 'disease9',
                 'disease10',
                 'disease11',
                 'disease12',
                 'disease13',
                 'disease14',
                 'disease15',
                 'disease16',
                 'disease17',
                 'disease18',
                 'disease19',
                 'disease20',
                 'disease21',
                 'disease22',
                 'disease23',
                 'disease24',
                 'disease25')

diagnosis_codes_df <- data.frame(principal_diagnosis_codes, description)

我希望输出看起来像:

ID  principal_diagnosis     other_diagnosis    other_diagnosis2
1   disease1                disease12          disease20
1   disease2                disease13          disease7
5   disease1                disease14          disease21

我试过了:

    mergedData <- merge(hospital_df, diagnosis_codes_df, by.x=c("principal_diagnosis"),
                        by.y=c("principal_diagnosis_codes"))
    
   mergedData <- merge(mergedData, diagnosis_codes_df, by.x=c("other_diagnosis"),
                        by.y=c("principal_diagnosis_codes"))

mergedData <- merge(mergedData, diagnosis_codes_df, by.x=c("other_diagnosis2"),
                        by.y=c("principal_diagnosis_codes"))

但这似乎效率低下,并且还给了我 description、description.y、description.x 等,这有点令人困惑。谁能告诉我更好的方法?我是 R 的新手,所以详细的解释会很有帮助。提前致谢!

标签: rdplyr

解决方案


Base-R 中的此代码使用列表中匹配的第一个诊断代码。

hospital_df[-1] <- sapply(unlist(hospital_df[-1]), function(x) diagnosis_codes_df$description[x == diagnosis_codes_df][1])


   id principal_diagnosis other_diagnosis other_diagnosis2
1   1            disease1       disease12        disease20
2   1            disease2       disease13         disease7
3   5            disease1       disease14        disease21
4   5            disease4        disease1         disease5
5   5            disease5       disease15         disease6
6  40            disease6       disease16        disease22
7  40            disease7        disease6        disease23
8  99            disease1       disease17        disease24
9  99            disease6       disease12        disease15
10 99           disease10       disease18        disease25
11 99            disease5       disease19         disease1

推荐阅读