首页 > 解决方案 > 正则表达式 while 循环内的代码块正在工作,但在手动终止之前不显示结果

问题描述

一般来说,我是 Perl 和 CS 的新手,只是为了学习目的而尝试编写一些与生物信息学相关的代码。我正在尝试遍历文本文件以使用简单的匹配运算符和 while 循环查找特定序列($motif)的所有出现,当我在代码本身中定义 $motif 时程序工作正常,但是当我使用用户-输入,我的while循环中的代码确实可以工作,但不能正常工作,它也不会终止,当我手动终止它时,有时它会显示一些预期的结果,而有时会显示全部。

use strict;
use warnings;
use feature ':5.28';


print 'Enter the file name containing the sequence:';

my $filename = <>;

chomp $filename;

open(SEQ, '<', $filename)
  or die "Could not open file '$filename' $!";

$/ = ''; #to read the whole file at once as it'll stop reading only when an undefined character comes up

my $row = <SEQ>; #storing the sequence from file to a variable
chomp $row;

print "\nEnter the Motif sequence to be searched:";

my $motif = <>;
my $counter = 0;
chomp $motif;

while ($row =~ m|($motif)|g) {
    $counter++;
    print"\n";
    print "The motif's occurnce $counter ends at position ", pos$row, "\n";
}

预期的输出是所有出现的 $motif 的列表,但程序没有终止,当我使用它手动终止它时,ctrl+c它显示前 2-3 次出现,而不是像我在代码本身会在瞬间给出数百个匹配项。

如果我直接在我的代码中将文件序列(我正在搜索的)分配给变量$row,那么while循环也可以正常运行,但是当我获取输入文件名($filename)并将其写入时循环不能正常工作$row以及用户要搜索的序列 ( $motif)。分配其中任何一个内部代码,程序功能正常。

标签: regexperlwhile-loop

解决方案


You've change the input record separator ($/) to '' (empty string) from the \n value it was before.
At this line: my $motif = <>; input is expected, and it will not end as usual with "enter" (the \n). This is where your program "get stuck". It wait till it get a EOF value (end of file). You can use Ctrl+d (or Ctrl+z in windows) to pass EOF value so program will continue.

chomp use it as well (input record separator), so it too will not work as expected (the 1st chomp will work fine as it is called before the change).
You should return its original value (even better to change it locally only). You also set the input record separator to an empty string. You should set it to undef if you want to read file in "slurp mode".

You can read more here: slurp mode - reading a file in one step

Simple update to your code (make sure to delete the $/ = ''; line):

my $row = '';
{
    open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
    local $/ = undef;
    $row = <$fh>;
    close $fh;
}

Though I would not recommend doing it this way... probably better reading the file to an array of lines, and using some more modern way like Path::Tiny.

I've made some small changes to your code and tested it successfully with MT_mouse.txt.
Code:

#!/usr/bin/perl

use strict;
use warnings;

print 'Enter the file name containing the sequence: ';
my $filename = <>;
chomp $filename;

open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
my @file_lines = <$fh>;
close $fh;

print 'Enter the Motif sequence to be searched: ';
my $motif = <>;
chomp $motif;
print 'Read ' . scalar(@file_lines) . " lines at file: '$filename'\nmotif: '$motif'\n";

my ($line, $occurrences) = (0, 0);
foreach my $row (@file_lines) {
    $line++;
    next unless $row =~ /\Q$motif\E/;
    my @motif_index = ();
    my $position = 0;
    while((my $index = index $row, $motif, $position) >= 0) {
        push(@motif_index, $index);
        $position = $index + length $motif;
    }
    print "Motif found on row#$line\tat position(s): " . join(', ', @motif_index) . ".\n";
    $occurrences += scalar @motif_index;
}

print "\nMotif '$motif' was " . ($occurrences ? "found $occurrences times" : 'not found') . " at file: '$filename'.\n";

__END__

Output:

Enter the file name containing the sequence: MT_mouse.txt
Enter the Motif sequence to be searched: ACCCC
Read 272 lines at file: 'MT_mouse.txt'
motif: 'ACCCC'
Motif found on row#4    at position(s): 41.
Motif found on row#9    at position(s): 19.
Motif found on row#11   at position(s): 40.
Motif found on row#23   at position(s): 8.
Motif found on row#33   at position(s): 3.
Motif found on row#59   at position(s): 1.
Motif found on row#61   at position(s): 31.
Motif found on row#65   at position(s): 37.
Motif found on row#83   at position(s): 3.
Motif found on row#98   at position(s): 22.
Motif found on row#115  at position(s): 48.
Motif found on row#122  at position(s): 26.
Motif found on row#132  at position(s): 49.
Motif found on row#133  at position(s): 36.
Motif found on row#173  at position(s): 21.
Motif found on row#183  at position(s): 21.
Motif found on row#188  at position(s): 52.
Motif found on row#199  at position(s): 7.
Motif found on row#209  at position(s): 51.
Motif found on row#228  at position(s): 28.
Motif found on row#230  at position(s): 43.
Motif found on row#247  at position(s): 45.
Motif found on row#249  at position(s): 53.
Motif found on row#269  at position(s): 11, 18, 39.

Motif 'ACCCC' was found 26 times at file: 'MT_mouse.txt'.

推荐阅读