首页 > 解决方案 > Conducting a series of t-tests between two data frames with covariates

问题描述

I have two dataframes, one with covariates for patient samples, and one with methylation data for the samples. I need to perform t-tests to compare the methylation data by sex.

My dataframes look somewhat like this - Covariates:

        "patient"   "sex"   "ethnicity"
sample1    p1         0      caucasian
sample2    p2         1      caucasian
sample3    p3         1      caucasian
sample4    p4         0      caucasian
sample5    p5         0      caucasian
sample6    p6         1      caucasian

and continues up to sample46

Methylation:

       sample1  sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe2  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe3  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe4  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111

and so on for 80,000 different probes and 46 different samples. So if I want to do a series of t-tests comparing the methylation data to sex for the first 8 samples, could I just specify:t.test(t(methylation[,1:8]) ~ covariates$sex)? Or is there a way that I can tie in the sample names (sample1, sample2...)? (Sorry in advance, I'm very new to both R and statistics)

标签: rstatistics

解决方案


One easy way is to create a single data.frame methyl_cov_df and then use the formula.

Below is a example of a t.test for first 6 samples probe1 values by sex (change appropriately for number of samples desired):

# combined data frame
methyl_cov_df <- cbind(t(methylation[,1:6]),covariates)

methyl_cov_df:

        probe1 probe2 probe3 probe4 patient sex ethnicity
sample1 0.1111 0.1111 0.1111 0.1111      p1   0 caucasian
sample2 0.2222 0.2222 0.2222 0.2222      p2   1 caucasian
sample3 0.3333 0.3333 0.3333 0.3333      p3   1 caucasian
sample4 0.4444 0.4444 0.4444 0.4444      p4   0 caucasian
sample5 0.5555 0.5555 0.5555 0.5555      p5   0 caucasian
sample6 0.6666 0.6666 0.6666 0.6666      p6   1 caucasian


# t.test by formula: slice the data.frame to use the number of samples: done for 6 below
t.test(formula = probe1~sex, data= methyl_cov_df[1:6,]) 

Welch Two Sample t-test

data:  probe1 by sex
t = -0.19612, df = 4, p-value = 0.8541
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
  -0.5613197  0.4872530
sample estimates:
  mean in group 0 mean in group 1 
0.3703333       0.4073667    

Data:

covariates <- read.table(text = '        "patient"   "sex"   "ethnicity"
sample1    p1         0      caucasian
           sample2    p2         1      caucasian
           sample3    p3         1      caucasian
           sample4    p4         0      caucasian
           sample5    p5         0      caucasian
           sample6    p6         1      caucasian', header = T)

methylation <- read.table(text = "       sample1  sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
probe1  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe2  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe3  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111
probe4  0.1111  0.2222  0.3333  0.4444  0.5555  0.6666  0.7777  0.8888  0.9999  1.111", header = T)

推荐阅读