首页 > 解决方案 > 在 perl 中的文件中创建一个列表

问题描述

这是我的示例文件:

CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 Pfam    PF00512 His Kinase A (phospho-acceptor) domain  402 467 2.2E-18 T   29-06-2014  IPR003661   Signal transduction histidine kinase EnvZ-like, dimerisation/phosphoacceptor domain GO:0000155|GO:0007165|GO:0016020
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 SMART   SM01079     114 316 4.1E-23 T   29-06-2014  IPR006189   CHASE   
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 Pfam    PF03924 CHASE domain    115 314 1.0E-40 T   29-06-2014  IPR006189   CHASE   
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 PRINTS  PR00344 Bacterial sensor protein C-terminal signature   602 616 9.2E-11 T   29-06-2014  IPR004358   Signal transduction histidine kinase-related protein, C-terminal    GO:0016310|GO:0016772
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 PRINTS  PR00344 Bacterial sensor protein C-terminal signature   637 655 9.2E-11 T   29-06-2014  IPR004358   Signal transduction histidine kinase-related protein, C-terminal    GO:0016310|GO:0016772
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 PRINTS  PR00344 Bacterial sensor protein C-terminal signature   620 630 9.2E-11 T   29-06-2014  IPR004358   Signal transduction histidine kinase-related protein, C-terminal    GO:0016310|GO:0016772
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 ProSiteProfiles PS50110 Response regulatory domain profile. 853 990 28.209  T   29-06-2014  IPR001789   Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 SMART   SM00448 cheY-homologous receiver domain 852 986 2.9E-29 T   29-06-2014  IPR001789   Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
CA11g10610  96f3aa6096d8ec217ee6f8cf6a90a745    998 Pfam    PF00072 Response regulator receiver domain  854 986 8.5E-21 T   29-06-2014  IPR001789   Signal transduction response regulator, receiver domain GO:0000156|GO:0000160

我想得到如下输出。(没有重叠的 PFAM ID)

CA11g10610  Number  PF00512, PF03924, PR00344, ...
CA10g10820  Number  PF01095, PF04043, ...

标签: perl

解决方案


试试这个:(我更喜欢使用哈希来获取标准和结构输出)

use strict;
use warnings;

my $storage = ""; #Create one variable 
while(<DATA>)
{
    my $line = $_;
    if($line=~m/\s+Pfam\b/i)
    {
        my @elements = split /\s+/, $line;
        #print join "-\n-", @elements; #print "\n";
        #print ": $element[1]\n"; exit;
        if($storage=~m/$elements[0]\b/) #Please check the values exists?
        {
            $storage=~s/^$elements[0]([^\n]+)$/$&, $elements[4]/m; #Add the super element
        }
        else
        {
            $storage .= "$elements[0] Number $elements[4]\n"; #Or else new store it the new line
        }
    }
}
print $storage;

谢谢。


推荐阅读