perl - 在 perl 中的文件中创建一个列表
问题描述
这是我的示例文件:
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 Pfam PF00512 His Kinase A (phospho-acceptor) domain 402 467 2.2E-18 T 29-06-2014 IPR003661 Signal transduction histidine kinase EnvZ-like, dimerisation/phosphoacceptor domain GO:0000155|GO:0007165|GO:0016020
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 SMART SM01079 114 316 4.1E-23 T 29-06-2014 IPR006189 CHASE
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 Pfam PF03924 CHASE domain 115 314 1.0E-40 T 29-06-2014 IPR006189 CHASE
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 PRINTS PR00344 Bacterial sensor protein C-terminal signature 602 616 9.2E-11 T 29-06-2014 IPR004358 Signal transduction histidine kinase-related protein, C-terminal GO:0016310|GO:0016772
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 PRINTS PR00344 Bacterial sensor protein C-terminal signature 637 655 9.2E-11 T 29-06-2014 IPR004358 Signal transduction histidine kinase-related protein, C-terminal GO:0016310|GO:0016772
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 PRINTS PR00344 Bacterial sensor protein C-terminal signature 620 630 9.2E-11 T 29-06-2014 IPR004358 Signal transduction histidine kinase-related protein, C-terminal GO:0016310|GO:0016772
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 ProSiteProfiles PS50110 Response regulatory domain profile. 853 990 28.209 T 29-06-2014 IPR001789 Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 SMART SM00448 cheY-homologous receiver domain 852 986 2.9E-29 T 29-06-2014 IPR001789 Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
CA11g10610 96f3aa6096d8ec217ee6f8cf6a90a745 998 Pfam PF00072 Response regulator receiver domain 854 986 8.5E-21 T 29-06-2014 IPR001789 Signal transduction response regulator, receiver domain GO:0000156|GO:0000160
我想得到如下输出。(没有重叠的 PFAM ID)
CA11g10610 Number PF00512, PF03924, PR00344, ...
CA10g10820 Number PF01095, PF04043, ...
解决方案
试试这个:(我更喜欢使用哈希来获取标准和结构输出)
use strict;
use warnings;
my $storage = ""; #Create one variable
while(<DATA>)
{
my $line = $_;
if($line=~m/\s+Pfam\b/i)
{
my @elements = split /\s+/, $line;
#print join "-\n-", @elements; #print "\n";
#print ": $element[1]\n"; exit;
if($storage=~m/$elements[0]\b/) #Please check the values exists?
{
$storage=~s/^$elements[0]([^\n]+)$/$&, $elements[4]/m; #Add the super element
}
else
{
$storage .= "$elements[0] Number $elements[4]\n"; #Or else new store it the new line
}
}
}
print $storage;
谢谢。
推荐阅读
- python - Python执行不流畅
- arkit - RealityKit 放置对象而不旋转
- c# - 使用 xmlReader 在 C# 中过滤特定元素值的大型 XML
- javascript - Google Places API 自动完成功能仅适用于一个输入字段
- powershell - 从文本文件的每一行中删除前 4 个字符
- python - 加速 for 循环
- c++ - 当我在 QTreeView 中重命名项目时如何排除文件扩展名
- typescript - TypeScript:元素隐式具有“any”类型,因为“any”类型的表达式不能用于索引类型“Assignable”
- python - 从 Djnago Rest 框架中的有效负载中提取数据
- jsplumb - Jsplumb绘制了具有多个连接的连续锚点的错误端点