首页 > 解决方案 > 使用 perl 计算矩阵中列的平均值或中值

问题描述

Var_ID    sample1 sample2 sample3 sample4 sample5 sample6 sample7
A_1     18.66530716     0       10.45969216     52.71893547     40.04726048     32.16758825     38.27754435
A_2     25.19816467     0       12.5516306      37.95763354     28.39714834     25.7340706      37.581589
A_3     61.5006053      0       6.807664053     4.57493135      23.69514333     9.304974679     29.44245014
A_4     46.71317515     4.988346264     21.47872616     36.08568845     7.47600779      18.34871344     75.02919728
A_5     38.12488272     0       0       28.71499464     19.82997811     19.46785483     66.33787183
A_6     44.16019386     3.313750449     10.70121259     38.35466425     8.691025042     13.40792311     42.72152213
B_1     38.39720331     13.32601073     0       19.28006783     9.985810405     9.803455466     95.44530538
B_2     46.53021582     1.899838598     24.54086634     13.74342921     24.20186228     6.988206544     47.62545788
B_3     48.42890507     0       6.0308135       20.26433556     20.99119304     10.30393217     64.20344867
A_7     32.10687649     0       20.56239825     23.03079775     9.542753971     10.5395511      44.46513374
B_4     34.82673166     0       6.122746633     39.08916191     8.524472297     14.64540603     54.99744731
B_5     32.49685303     2.910517165     15.66506159     35.79294964     8.723952928     10.7058016      52.11522135
B_6     30.38974634     0       0       30.51870034     10.53778987     17.24225836     50.36058827
B_7     59.60856159     0       8.097826192     19.0468412      2.818575518     11.06841746     10.77608287
A_8     36.07790915     6.260541956     0       31.70212496     14.07396097     4.605650219     67.26011453
C_1     0       17.27445836     0       382.0309737     1.849224149     0       0
C_2     344.0389416     119.4010562     32.13217433     0       22.36821531     285.4766232     21.37974841
C_3     235.5547989     37.86357293     22.23167043     2.490045661     2.579360621     30.38709443     14.79226135
C_4     0       2.801263518     0       334.3615367     0       0       0
C_5     9.397916894     128.2900334     187.2504332     25.16745451     22.81140838     14.39668285    0

这是数据矩阵。行是变量,列是样本 ID。

A_1 - A_8 是集群A,B_1 - B_7 是集群B,C_1 - C_5 是集群C。

现在我想计算 A_1 - A_8 的平均值或中位数作为 clusterA 的值,得到中位数结果为:

Var_ID  sample1 sample2 sample3 sample4 sample5 sample6 sample7
clusterA        37.10139593     0       10.58045238     33.89390671     16.95196954     15.87831827     43.59332793

谁能帮我用 perl 脚本解决这个问题?

标签: perlcluster-computingmeanmediandatamatrix

解决方案


计算均值和中位数:

#!/usr/bin/perl

use strict;
use warnings;


use Data::Dumper;
use List::Util qw(sum);
use POSIX qw(floor ceil);

my %data   = ();
my %avg    = ();
my %median = ();


while (<>) {
    next if $. == 1;
    my @fields = split;
    my $cluster = substr($fields[0],0,1);
    $data{$cluster} = [] unless exists($data{$cluster});
    push @{$data{$cluster}}, [ @fields[1..$#fields] ];
}

for my $cluster (keys(%data)) {
    for my $sampleNo (0..scalar(@{$data{$cluster}[0]})-1) {
        my @samples = map { $_->[$sampleNo] } @{$data{$cluster}};
        my $cnt = @samples;
        $avg{$cluster}[$sampleNo] = sum(@samples)/$cnt;
        my @sorted = sort @samples;
        $median{$cluster}[$sampleNo] = (@sorted[floor(($cnt+1)/2)-1] +
            @sorted[ceil(($cnt+1)/2)-1])/2;
    }
}

print "Mean\n";

for my $cluster (sort keys (%data)) {
    print join("\t", ($cluster,map {sprintf "%15.9f",$_ } @{$avg{$cluster}})),"\n";
}
print "Median\n";

for my $cluster (sort keys (%data)) {
    print join("\t", ($cluster,map {sprintf "%15.9f",$_ } @{$median{$cluster}})),"\n";
}

输出:

perl test.pl  <sample.txt  
Mean
A      37.818389312     1.820329834    10.320165477    31.642471301    18.969159754    16.697040778    50.139427875
B      41.525459546     2.590909499     8.636759179    25.390783670    12.254808048    11.536782519    53.646221676
C     117.798331479    61.126076882    48.322855592   148.810002114     9.921641692    66.052080096     7.234401952
Median
A      37.101395935     0.000000000    11.626421595    37.021660995    34.222204410    22.600962715    43.593327935
B      38.397203310     0.000000000    24.540866340    20.264335560    24.201862280    14.645406030    52.115221350
C     235.554798900    17.274458360   187.250433200    25.167454510     2.579360621    14.396682850     0.000000000

推荐阅读