首页 > 解决方案 > 无法匹配 | 之间的字符串 在 Perl 中

问题描述

我有一个从 fasta 文件中读取的文本,我正在尝试读取 Perl 中的入藏号。但我没有得到输出。这是代码:

use strict;
use warnings;

sub main {

    my $file = "PXXXXX.fasta";

    if(!open(FASTA, $file)) {
        die "Could not find $file\n";
    }
    my $myLine = <FASTA>;
    my $pat = "|";
    my @Num = $myLine =~ /$pat(.*?)$pat/;
    print($Num[0]);
    close(FASTA);

}

main();

FASTA 文件句柄的内容是:

sp|P27455|MOMP_CHLPN Major outer membrane porin OS=Chlamydia pneumoniae OX=83558 GN=ompA PE=2 SV=1 MKKLLKSALLSAAFAGSVGSLQALPVGNPSDPSLLIDGTIWEGAAGDPCDPCATWCDAIS LRAGFYGDYVFDRILKVDAPKTFSMGAKPTGSAAANYTTAVDRPNPAYNKHLHDAEWFTN AGFIALNIWDRFDVFCTLGASNGYIRGNSTAFNLVGLFGVKGTTVNANELPNVSLSNGVV ELYTDTSFSWSVGARGALWECGCATLGAEFQYAQSKPKVEELNVICNVSQFSVNKPKGYK GVAFPLPTDAGVATATGTKSATINYHEWQVGASLSYRLNSLVPYIGVQWSRATFDADNIR IAQPKLPTAVLNLTAWNPSLLGNATALSTTDSFSDFMQIVSCQINKFKSRKACGVTVGAT LVDADKWSLTAEARLINERAAHVSGQFRF

任何线索如何修复代码返回:P27455

标签: regexperl

解决方案


The pipe | holds a special meaning in regular expressions. You need to escape it. The easiest way to do that is by using \Q and \E.

$myLine =~ /\Q$pat\E(.*?)\Q$pat\E/;

Or you could use the quotemeta built-in.

my $pat = quotemeta "|";
my @Num = $myLine =~ /$pat(.*?)$pat/; # or use [^$pat]+ 

You can also just not use a regular expression search and simply split the line. If you always want the second column, this will do just as well.

my (undef, $num) = split /\|/, $line;

推荐阅读