string - 哈希中键的 Perl 计数频率
问题描述
我从多维哈希中提取了第一级键,如下所示:
my @string = keys %hash;
print "@string\n";
Bacteroides fragilis (strain YCH46).Agrocybe aegerita (Black poplar mushroom) (Agaricus
aegerita).Parabacteroides distasonis (strain ATCC 8503 / DSM 20701 / CIP 104284 / JCM 5825 / NCTC
11152).Pelodictyon phaeoclathratiforme (strain DSM 5477 / BU-1).Clostridium kluyveri (strain NBRC
12016).Torpedo marmorata (Marbled electric ray).Aethionema grandiflorum (Persian stone-cress).Conus
consors (Singed cone).Saguinus labiatus (Red-chested mustached tamarin).Staphylococcus haemolyticus
(strain JCSC1435).Aeromonas salmonicida (strain A449).Acinetobacter genomosp. 13.Staphylococcus
aureus (strain USA300 / TCH1516).Loxosceles variegata (Recluse spider). and so on...
我试图计算一个相同的有机体重复了多少次(我确定其中一些重复了很多次)。
我试过这段代码:
my %count;
foreach my $os (@string)
{
$count{$os}++;
}
foreach my $os (sort keys %count)
{
print $os, " ", $count{$os}, "\n";
}
但是我像所有只出现一次的生物一样获得输出,尽管我知道情况并非如此。
奇怪的是,当我尝试手动定义一个测试字符串并重复一些有机体时,代码起作用了。
我的哈希键发生了什么?
我可以在列表中单独访问它们,因此它们原则上定义明确......
有什么帮助吗?
编辑:
有机体为值时的翻斗车结构:
'ACYP_SYNJB' => {
'94' => 'Synechococcus sp. (strain JA-2-3B\'a(2-13))
(Cyanobacteria bacterium Yellowstone B-Prime).'
},
'ACTM_STRPU' => {
'374' => 'Strongylocentrotus purpuratus (Purple sea
urchin).'
},
'A2ML1_HUMAN' => {
'1454' => 'Homo sapiens (Human).'
},
'ACTP_SALDC' => {
'549' => 'Salmonella dublin (strain CT_02021853).'
},
'ACBG2_XENLA' => {
'739' => 'Xenopus laevis (African clawed frog).'
},
'ACO1_AJECA' => {
'476' => 'Ajellomyces capsulatus (Darling\'s disease
fungus) (Histoplasma capsulatum).'
},
'ACTM_PISOC' => {
'376' => 'Pisaster ochraceus (Ochre sea star)
(Asterias ochracea).'
},
'3MGH_RHOPB' => {
'200' => 'Rhodopseudomonas palustris (strain
BisB18).'
}
};
当键:
$VAR3585 = 'Geobacter sulfurreducens (strain ATCC 51573 / DSM 12127 / PCA).';
$VAR3586 = {
'ACPS_GEOSL' => 126,
'ACP_GEOSL' => 77,
'ACKA_GEOSL' => 421,
'ACYP_GEOSL' => 91,
'ACCA_GEOSL' => 319
};
$VAR3587 = 'Bactrocera dorsalis (Oriental fruit fly) (Dacus dorsalis).';
$VAR3588 = {
'ACT3_BACDO' => 376,
'ACT5_BACDO' => 376,
'ACT1_BACDO' => 376,
'ACT2_BACDO' => 376
};
$VAR3589 = 'Caenorhabditis elegans.';
$VAR3590 = {
'ACH5_CAEEL' => 511,
'6PGD_CAEEL' => 484,
'ACM2_CAEEL' => 627,
'ACADM_CAEEL' => 417,
'ADAL_CAEEL' => 388,
'ACON_CAEEL' => 777,
'ACBP3_CAEEL' => 116,
'2AB1_CAEEL' => 495,
'3HIDH_CAEEL' => 299,
'ACH1_CAEEL' => 498,
'6PGL_CAEEL' => 269,
'2A51_CAEEL' => 542,
'2AAA_CAEEL' => 590,
'A16L2_CAEEL' => 534,
'ACH4_CAEEL' => 548,
'ACC2_CAEEL' => 445,
'ADA17_CAEEL' => 686,
'ACR5_CAEEL' => 598,
'ACTL1_CAEEL' => 360,
'ADBP1_CAEEL' => 217,
'ACH8_CAEEL' => 474,
'5NT3_CAEEL' => 376,
'ACT2_CAEEL' => 376,
'AAR2_CAEEL' => 357,
'ACH23_CAEEL' => 545,
'ACD11_CAEEL' => 617,
'ABF2_CAEEL' => 85,
'ABDH3_CAEEL' => 375,
'ABF1_CAEEL' => 85,
'ABH51_CAEEL' => 355,
'ACX15_CAEEL' => 659,
'ACC1_CAEEL' => 466,
'ABL1_CAEEL' => 1224,
'ACC3_CAEEL' => 517,
'ABH52_CAEEL' => 444,
'ACT4_CAEEL' => 376,
'ACH2_CAEEL' => 493,
'ACBP1_CAEEL' => 86,
'14332_CAEEL' => 248,
'ACR7_CAEEL' => 538,
'ACC4_CAEEL' => 408,
'ACE1_CAEEL' => 620,
'AATC_CAEEL' => 408,
'ACH6_CAEEL' => 502,
'ACH3_CAEEL' => 564,
'ACR3_CAEEL' => 487,
'ACMSD_CAEEL' => 401,
'ACH7_CAEEL' => 507,
'ACR2_CAEEL' => 575,
'ACASE_CAEEL' => 272,
'ACM3_CAEEL' => 611,
'AAPK2_CAEEL' => 626,
'ACN1_CAEEL' => 906,
'3HAO_CAEEL' => 281,
'ADAS_CAEEL' => 597,
'ACT1_CAEEL' => 376,
'A4_CAEEL' => 686,
'ADA10_CAEEL' => 922,
'A16L1_CAEEL' => 578,
'ACT3_CAEEL' => 376,
'ACP1_CAEEL' => 426,
'ACM1_CAEEL' => 713,
'AAPK1_CAEEL' => 589,
'ACOC_CAEEL' => 887,
'ACLY_CAEEL' => 1106,
'14331_CAEEL' => 248
};
$VAR3591 = 'Anopheles stephensi (Indo-Pakistan malaria mosquito).';
$VAR3592 = {
'ACES_ANOST' => 664
};
$VAR3593 = 'Bacillus thuringiensis subsp. konkukian (strain 97-27).';
$VAR3594 = {
'ACKA_BACHK' => 397,
'ACCD_BACHK' => 289,
'ACPS_BACHK' => 119,
'3MGH_BACHK' => 205,
'ACCA_BACHK' => 324,
'ACP_BACHK' => 77
};
更准确地说,我想知道哪些生物在我的哈希中具有超过 50 个蛋白质 ID,然后选择它们,摆脱蛋白质数量较少的其他生物
解决方案
更准确地说,我想知道哪些生物在我的哈希中具有超过 50 个蛋白质 ID,然后选择它们,摆脱蛋白质数量较少的其他生物
我不确定我是否完全理解了你的问题,但看起来你有以下类型的哈希:
my %hash = (
'protein_id#1' => {
'some-number' => 'organism-name'
},
'protein_id#2' => {
'some-number' => 'same-or-other-organism-name',
},
...
);
你想计算有多少protein_id#X´ are for each different
有机体名称。
在这种情况下,以下应该起作用:
my %organism;
# "outer" hash has protein_id as key
while (my ($protein,$h2) = each %hash) {
# "inner" hash has organism-name as value
# same organism could maybe be multiple times inside the same inner hash
# but should only be counted once per protein_id
my %organism;
while (my ($some_number,$o) = each %$h2) {
$organism{$o}++
}
for (keys %organism) {
$count{$_}++;
}
}
推荐阅读
- jquery - 输入值更改时我无法检测到事件
- javascript - 对齐圆圈和球没有弹跳
- c++ - 我想知道为什么`std::unique_ptr
foo (new int)' 是合法的,因为 `std::unique_ptr ` 要求输入参数类型应该是 `int`? - javascript - Tradingview小部件深色和浅色主题切换
- python - “TemplateDoesNotExist at / home1.html”而 {%load static%} 在中创建 'UNWANTED TOKEN ' 错误
- sql - 选择没有分组的最大值
- c# - 从 powershell 获取 Acl 并从 c# 获取 GetAccessControl 为在 NFS 驱动器中创建的文件夹抛出错误代码 50
- javascript - 点击 Javascript 放大图像(无框架)
- python - 当我在 cmd 中输入“python --version”时,什么也没有发生
- apache-camel - Apache Camel Rest - v3.2.0 - 多部分文件上传问题