首页 > 解决方案 > 如何使用 Perl 将矩阵(成对比较)转换为列?

问题描述

不久前,我使用 Perl 的脚本将矩阵转换为列。

矩阵(成对比较矩阵)如下所示(727 * 727 = 528.529 个值):

ID (column name)
NP_073551           
QJY77946             0.3872 (0.0293 0.0757)
QJY77954             0.3668 (0.0273 0.0745) 0.4851 (0.0041 0.0085)
QJY77962             0.3668 (0.0273 0.0745) 0.3767 (0.0041 0.0109)-1.0000 (0.0000 0.0024)
...

其中空值是相同ID的比较,其他值等于this = dN/dS (dN, dS)

为此,我首先使用此脚本:

use strict;
use warnings;

my $file = $ARGV[0] || "file.txt" ;

open (F, $file) or die;
while (my $linea = <F>) {
  $linea =~ s/\n// ;
  $linea =~ s/\r// ;
  $linea =~ s/\s+/\t/g ;

  print $linea;
  print "\n";
}

close F;

exit;

然后,为了生成我使用的列:

use strict;
use warnings;

my $file = $ARGV[0] || "file_01.txt" ;
my @content;

#### Reading and storing file into an array
open (F, $file) or die;
while (my $linea = <F>) {
  $linea =~ s/\n// ;
  $linea =~ s/\r// ;
  next if ($linea =~ /^\t/);
  push @content, $linea;
}
close F;

##### Analyzing content
#### 1. storing ids in an array
my @ids;
foreach my $term (@content) {
  next if (length($term) < 1);
  my @partition = split ("\t", $term);
  my $idd = $partition[0];
  next if (length($idd) < 2);
  push @ids, $idd;
}

#### 2. using ids to create the pairs
foreach my $term (@content) {
  next if (length($term) < 1);
  my @partition2 = split ("\t", $term);
  my $a = scalar @partition2;
  my $id1 = $partition2[0];

  my $nn = 0;
  foreach my $term2 (@partition2) {
    my $id2 = $ids[$nn];
    if ($id1 eq $id2) {
      print "$id1\t$id2\tNA\n";
    } else {
      my $nk = $nn+1;
      my $value = $partition2[$nk];
      print "$id1\t$id2\t$value\n";
    }
    $nn++;
  }

  #print "$id1\n";
}


exit;

此脚本的结果:

NP_073551   NP_073551   NA
QJY77946    NP_073551   0.3872
QJY77946    QJY77946    NA
QJY77946    QJY77954    0.0757)
QJY77946    QJY77962    
QJY77954    NP_073551   0.3668
QJY77954    QJY77946    (0.0273
QJY77954    QJY77954    NA
QJY77954    QJY77962    0.4851
QJY77954    QJY77970    (0.0041
QJY77954    QJY77978    0.0085)
QJY77954    QEO75985    

但我需要这个(生成列):

ID_1       ID_2      Omega    dN      dS
NP_073551  NP_073551  NA 
NP_073551  QJY77946  0.3872   0.0293  0.0757         
QJY77954   NP_073551 0.3668   0.0273  0.0745
QJY77962   QJY77954  0.3668   0.0273  0.0745       
...                      

标签: perlmatrix

解决方案


使用哈希对记录进行分组。

my %grouped;
for (...) {
   ...
   my $row = $grouped{$id1}{$id2} //= { id1 => $id1, id => $id2 };
   $row->{$key} = $value;
}

for (values(%grouped)) {
   for my $row (values(%$_)) {
      my ($id1, $id2, $omega, $dn, $ds) = $row->@{qw( id1 id2 omega dn ds )};
      ...
   }
}

推荐阅读