r - 在R中的查找表中按值拆分列
问题描述
我有一张桌子,每个 hpo_term 一行,所以一个病人每个 ID 可以有很多行。
ID hpo_term
123 kidney failure
123 hand tremor
123 kidney transplant
432 hypertension
432 exotropia
432 scissor gait
我还有另外两张表格,一张是肾脏术语,另一张是非肾脏术语,肾脏一个看起来像这样:
kidney failure
kidney transplant
hypertension
非肾脏的看起来像这样:
hand tremor
exotropia
scissor gait
我想要的结果是这样的表格:
ID kidney_hpo_term non_kidney_hpo_term
123 kidney failure;kidney transplant hand tremor
432 hypertension exotropia;scissor gait
实际上有数百名患者和数百个 HPO 术语。
我可以访问基础 R;dplyr 但我真的不知道如何解决这个问题。
您的帮助将不胜感激。
非常感谢
编辑:
真正的 table1 有更多不相关的额外列,并且每个 ID 总是相同的,我也想导入它。例如:
ID hpo_term year_of_birth affected_relative genome
123 kidney failure 2000 Y 38
123 hand tremor 2000 Y 38
123 kidney transplant 2000 Y 38
432 hypertension 1980 N 37
432 exotropia 1980 N 37
432 scissor gait 1980 N 37
解决方案
这是一个 dplyr 解决方案:
library(dplyr)
table1 = data.frame(ID = c(123,123,123,432,432,432),
hpo_term = c("kidney failure","hand tremor","kidney transplant","hypertension","exotropia","scissor gait"))
kid_terms = c("kidney failure","kidney transplant","hypertension")
nonkid_terms = c("hand tremor","exotropia","scissor gait")
table1$term_type = NA
table1$term_type[table1$hpo_term %in% kid_terms] = "kidney_hpo_term"
table1$term_type[table1$hpo_term %in% nonkid_terms] = "non_kidney_hpo_term"
table2 = table1 %>% group_by(ID,term_type) %>%
summarize(term_list=paste(hpo_term,collapse=";")) %>%
spread(term_type,term_list)
> table2
ID kidney_hpo_term non_kidney_hpo_term
1 123 kidney failure;kidney transplant hand tremor
2 432 hypertension exotropia;scissor gait
这是一个data.table
解决方案:
library(data.table)
table1 = data.table(ID = c(123,123,123,432,432,432),
hpo_term = c("kidney failure","hand tremor","kidney transplant","hypertension","exotropia","scissor gait"))
kid_terms = c("kidney failure","kidney transplant","hypertension")
nonkid_terms = c("hand tremor","exotropia","scissor gait")
table1$term_type = NA
table1$term_type[table1$hpo_term %in% kid_terms] = "kidney_hpo_term"
table1$term_type[table1$hpo_term %in% nonkid_terms] = "non_kidney_hpo_term"
table2 = table1[,.(term_list=paste(hpo_term,collapse=";")),by=.(ID,term_type)]
table3 = dcast(table2, ID~term_type, value.var = "term_list")
> table3
ID kidney_hpo_term non_kidney_hpo_term
1: 123 kidney failure;kidney transplant hand tremor
2: 432 hypertension exotropia;scissor gait
推荐阅读
- google-apps-script - 拒绝访问:DriveApp
- java - 递归函数具有 void 数据类型,但出现错误
- excel - 即使文件已打开并被其他用户使用,如何使 ssis excel 数据源任务不会失败?
- detox - Detox 的 `device.openURL` 无法在 Android 上运行
- node.js - 当我尝试连接 Mondo DB 时,event.bind 停止作为一个函数工作
- java - Java:在不创建许多线程的情况下向 websocket 发送消息
- excel - 检查多个工作表上的特定行是否有一个“x”(问卷)
- javascript - 在页面完成加载之前捕获 cookie 的值
- javascript - 如何将 React.Component 放入 CSS 内容属性(在 :before/:after 伪元素中)?
- c# - 如何从具有定义的最大颜色深度的屏幕中获取像素颜色?