r - 合并重复分数但标记差异
问题描述
这就是我所拥有的:
df <- structure(list(Sample = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L,
4L), .Label = c("19-0001", "19-0002", "19-0003", "19-0004"), class = "factor"),
Replicate = c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), X24854000 = structure(c(1L,
2L, 2L, 1L, 2L, 2L, 1L, 1L), .Label = c("", "CC"), class = "factor"),
X24854056 = structure(c(3L, 3L, 2L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"AA", "GG"), class = "factor"), X24854764 = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "TA", class = "factor"),
X24854903 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("",
"CT"), class = "factor"), X24855066 = structure(c(1L, 1L,
3L, 3L, 2L, 2L, 2L, 2L), .Label = c("", "CA", "CC"), class = "factor"),
X24855114 = structure(c(2L, 1L, 3L, 3L, 2L, 2L, 2L, 2L), .Label = c("",
"GA", "GG"), class = "factor"), X24855316 = structure(c(2L,
2L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("", "TC"), class = "factor"),
X24855449 = structure(c(1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("CC",
"GG"), class = "factor"), X24855925 = structure(c(2L, 1L,
1L, 3L, 2L, 2L, 1L, 1L), .Label = c("", "GA", "GG"), class = "factor"),
X24856070 = structure(c(2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("CC",
"CT"), class = "factor"), X24856086 = structure(c(2L, 1L,
2L, 2L, 2L, 2L, 2L, 2L), .Label = c("CC", "CT"), class = "factor"),
X24856329 = structure(c(2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("",
"AG"), class = "factor"), X24856389 = structure(c(2L, 1L,
1L, 1L, 2L, 2L, 2L, 2L), .Label = c("", "GG"), class = "factor"),
X24857235 = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("",
"CT"), class = "factor"), X24857350 = structure(c(3L, 3L,
1L, 1L, 2L, 2L, 1L, 1L), .Label = c("", "GA", "GG"), class = "factor"),
X24857404 = structure(c(1L, 3L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("",
"AT", "TT"), class = "factor")), class = "data.frame", row.names = c(NA,
-8L))
这将生成此表
Sample Replicate X24854000 X24854056 X24854764 X24854903 X24855066 X24855114 X24855316 X24855449 X24855925 X24856070 X24856086 X24856329 X24856389 X24857235 X24857350 X24857404
19-0001 1 GG TA GA TC CC GA CT CT AG GG GG
19-0001 2 CC GG TA TC GG CC CC GG TT
19-0002 1 CC AA TA CC GG GG CC CT AG
19-0002 2 TA CC GG GG GG CC CT AG
19-0003 1 CC TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0003 2 CC TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0004 1 TA CA GA TC CC CC CT AG GG CT
19-0004 2 TA CA GA CC CC CT AG GG
这就是我要的:
Sample Replicate X24854000 X24854056 X24854764 X24854903 X24855066 X24855114 X24855316 X24855449 X24855925 X24856070 X24856086 X24856329 X24856389 X24857235 X24857350 X24857404
19-0001 1 CC GG TA GA TC 99 GA 99 99 AG GG GG TT
19-0002 1 CC AA TA CC GG GG GG CC CT AG
19-0003 1 CC TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0004 1 TA CA GA TC CC CC CT AG GG CT
将重复 1 和 2 合并到相同的样本名称下。缺失或相同的分数可以用另一个替换,但任何不匹配的都应替换为“99”,以便以后将其删除。
我试过了:
data_merge <- data %>%
group_by(Sample) %>%
summarise_all(ifelse(statement), (if_true), (if_false))
我只对数据进行子集化,真实数据有 44 个 X 数字。
解决方案
这是一个选项
df %>%
mutate_if(is.factor, as.character) %>%
group_by(Sample) %>%
summarise_at(
vars(starts_with("X")),
~if_else(length(unique(.x[.x != ""])) == 1, first(.x[.x != ""]), "99"))
## A tibble: 4 x 17
# Sample X24854000 X24854056 X24854764 X24854903 X24855066 X24855114 X24855316
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 19-00… CC GG TA 99 99 GA TC
#2 19-00… CC AA TA 99 CC GG 99
#3 19-00… CC 99 TA CT CA GA TC
#4 19-00… 99 99 TA 99 CA GA TC
## … with 9 more variables: X24855449 <chr>, X24855925 <chr>, X24856070 <chr>,
## X24856086 <chr>, X24856329 <chr>, X24856389 <chr>, X24857235 <chr>,
## X24857350 <chr>, X24857404 <chr>
样本数据
df <- read.table(text =
"Sample Replicate X24854000 X24854056 X24854764 X24854903 X24855066 X24855114 X24855316 X24855449 X24855925 X24856070 X24856086 X24856329 X24856389 X24857235 X24857350 X24857404
19-0001 1 '' GG TA '' '' GA TC CC GA CT CT AG GG '' GG ''
19-0001 2 CC GG TA '' '' '' TC GG '' CC CC '' '' '' GG TT
19-0002 1 CC AA TA '' CC GG '' GG '' CC CT AG '' '' '' ''
19-0002 2 '' '' TA '' CC GG '' GG GG CC CT AG '' '' '' ''
19-0003 1 CC '' TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0003 2 CC '' TA CT CA GA TC CC GA CC CT AG GG CT GA AT
19-0004 1 '' '' TA '' CA GA TC CC '' CC CT AG GG CT '' ''
19-0004 2 '' '' TA '' CA GA '' CC '' CC CT AG GG '' '' ''", header = T)
推荐阅读
- go - 如何在无限 for 循环中实现 goroutine;for{} 在等待组完成后重试?
- python - Django 'tuple' 对象没有属性 'save'
- sql - 查询以根据特定时间范围过滤掉数据
- javascript - 如何找到()一个对象属性匹配多个指定数组中的值
- python - MobilenetV3 Top 5 准确性问题
- python - 如何用多个单独的训练数据训练 LSTM 模型?
- r - 如何在 sparklyr 中操作日期?
- bootstrap-4 - 引导表与背景颜色混淆
- python - 为什么使用张量流 keras 的多变量线性回归会产生 NaN 损失
- ios - 实现分页后,我在滚动 tableview 时遇到 UIImage 问题