r - Conducting a series of t-tests between two data frames with covariates
问题描述
I have two dataframes, one with covariates for patient samples, and one with methylation data for the samples. I need to perform t-tests to compare the methylation data by sex.
My dataframes look somewhat like this - Covariates:
"patient" "sex" "ethnicity"
sample1 p1 0 caucasian
sample2 p2 1 caucasian
sample3 p3 1 caucasian
sample4 p4 0 caucasian
sample5 p5 0 caucasian
sample6 p6 1 caucasian
and continues up to sample46
Methylation:
sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111
probe2 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111
probe3 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111
probe4 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111
and so on for 80,000 different probes and 46 different samples.
So if I want to do a series of t-tests comparing the methylation data to sex for the first 8 samples, could I just specify:t.test(t(methylation[,1:8]) ~ covariates$sex)
? Or is there a way that I can tie in the sample names (sample1, sample2...)? (Sorry in advance, I'm very new to both R and statistics)
解决方案
One easy way is to create a single data.frame methyl_cov_df
and then use the formula.
Below is a example of a t.test for first 6 samples probe1
values by sex
(change appropriately for number of samples desired):
# combined data frame
methyl_cov_df <- cbind(t(methylation[,1:6]),covariates)
methyl_cov_df:
probe1 probe2 probe3 probe4 patient sex ethnicity
sample1 0.1111 0.1111 0.1111 0.1111 p1 0 caucasian
sample2 0.2222 0.2222 0.2222 0.2222 p2 1 caucasian
sample3 0.3333 0.3333 0.3333 0.3333 p3 1 caucasian
sample4 0.4444 0.4444 0.4444 0.4444 p4 0 caucasian
sample5 0.5555 0.5555 0.5555 0.5555 p5 0 caucasian
sample6 0.6666 0.6666 0.6666 0.6666 p6 1 caucasian
# t.test by formula: slice the data.frame to use the number of samples: done for 6 below
t.test(formula = probe1~sex, data= methyl_cov_df[1:6,])
Welch Two Sample t-test
data: probe1 by sex
t = -0.19612, df = 4, p-value = 0.8541
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.5613197 0.4872530
sample estimates:
mean in group 0 mean in group 1
0.3703333 0.4073667
Data:
covariates <- read.table(text = ' "patient" "sex" "ethnicity"
sample1 p1 0 caucasian
sample2 p2 1 caucasian
sample3 p3 1 caucasian
sample4 p4 0 caucasian
sample5 p5 0 caucasian
sample6 p6 1 caucasian', header = T)
methylation <- read.table(text = " sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111
probe2 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111
probe3 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111
probe4 0.1111 0.2222 0.3333 0.4444 0.5555 0.6666 0.7777 0.8888 0.9999 1.111", header = T)
推荐阅读
- image-processing - 想不出解决这个图像分割问题
- c - 如何在具有 Web 扩展原生消息传递 API 的原生 C 应用程序中使用`stdin`?
- swift - 如何删除上一个单元格的 UITableViewCell 分隔符?
- mysql - MySQL最后一行有序视图
- c - 如何为嵌入式编程配置vim?
- linux - 如何在 cronjob 上动态获取版本文件夹名称?
- firebase - 使用firebase颤振谷歌登录
- grafana - Grafana 在查询中使用通配符设置警报
- .net - Stripe 为同一客户创建多张卡片
- apache-flink - Flink - 纱线节点终止后无法恢复