r - 用 NA 总结数据框
问题描述
我有 3 种不同类别的突变,例如
"CNA" "MUTATIONS" "STRUCTURAL_VARIANT"
f <- dput(e)
structure(list(track_name = c("AR", "ASCL1", "ATOH1", "PRDM1",
"DLX1", "DLX2", "EPAS1", "ETV2", "EYA2", "FOXG1", "FOXC2", "GATA1",
"GATA2", "GATA3", "GATA4", "GATA6", "GBX1", "GLI2", "GLI3", "MNX1"
), track_type = c("CNA", "CNA", "CNA", "CNA", "CNA", "CNA", "CNA",
"CNA", "CNA", "CNA", "CNA", "CNA", "CNA", "CNA", "CNA", "CNA",
"CNA", "CNA", "CNA", "CNA"), `TCGA-AB-2929` = c("amp_rec", NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "Amplification", NA, NA,
NA, NA, NA, NA, NA, NA), aml_ohsu_2018_1408 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
aml_ohsu_2018_1992 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), row.names = c(NA, -20L), class = c("tbl_df",
"tbl", "data.frame"))
这给了我这样的数据框
track_name track_type `TCGA-AB-2929` aml_ohsu_2018_1408 aml_ohsu_2018_1992
<chr> <chr> <chr> <chr> <chr>
1 AR CNA amp_rec NA NA
2 ASCL1 CNA NA NA NA
3 ATOH1 CNA NA NA NA
4 PRDM1 CNA NA NA NA
5 DLX1 CNA NA NA NA
6 DLX2 CNA NA NA NA
7 EPAS1 CNA NA NA NA
8 ETV2 CNA NA NA NA
9 EYA2 CNA NA NA NA
10 FOXG1 CNA NA NA NA
11 FOXC2 CNA NA NA NA
12 GATA1 CNA Amplification NA NA
13 GATA2 CNA NA NA NA
14 GATA3 CNA NA NA NA
15 GATA4 CNA NA NA NA
16 GATA6 CNA NA NA NA
17 GBX1 CNA NA NA NA
18 GLI2 CNA NA NA NA
19 GLI3 CNA NA NA NA
20 MNX1 CNA NA NA NA
这是我的小子集。对于每个样本,第一列包含基因,第二列包含突变类。
我试图在样本中找到这些类中每个基因的突变分布。第二列之后的我的列包含各种突变,例如
扩增,帧内突变(假定的乘客),深度删除,错义突变(假定的乘客)分布在样本的每一列中。
在我的示例数据框中,我有一个这样的观察结果
GATA1 CNA Amplification
我在做这个
table(Store2df$track_name, Store2df$track_type) %>% prop.table() %>% round(2)
有没有更好的方法/方法来总结?
解决方案
不一定是更好的方法,但如果您正在使用dplyr
,您可以这样做 -
library(dplyr)
e %>%
count(track_name, track_type) %>%
mutate(n = round(prop.table(n), 2))
这将以长格式返回数据。
推荐阅读
- ios - AVAssetReader 初始化失败?
- c# - ReactiveUI - 实现 ReactiveCommand 的正确方法
- java - 当 gatt 对象不为空时,BluetoothGatt.close() 上的 NullPointerException
- javascript - 通过 id 选择未选择选项的值
- ios - iOS 不可更新订阅 IAP - 可以在 iTunes Connect 中创建的最大限制
- arrays - Strange python function return while dealing with json
- javascript - html元素上的一行中的多个jquery函数
- python - 试图在列表中获取输出,但我得到的是单个字符串
- reactjs - 尝试使用 API 中的数据填充 BootstrapTable
- html - touchcancel 和 touchend 事件有什么区别?