regex - 正则表达式 while 循环内的代码块正在工作,但在手动终止之前不显示结果
问题描述
一般来说,我是 Perl 和 CS 的新手,只是为了学习目的而尝试编写一些与生物信息学相关的代码。我正在尝试遍历文本文件以使用简单的匹配运算符和 while 循环查找特定序列($motif)的所有出现,当我在代码本身中定义 $motif 时程序工作正常,但是当我使用用户-输入,我的while循环中的代码确实可以工作,但不能正常工作,它也不会终止,当我手动终止它时,有时它会显示一些预期的结果,而有时会显示全部。
use strict;
use warnings;
use feature ':5.28';
print 'Enter the file name containing the sequence:';
my $filename = <>;
chomp $filename;
open(SEQ, '<', $filename)
or die "Could not open file '$filename' $!";
$/ = ''; #to read the whole file at once as it'll stop reading only when an undefined character comes up
my $row = <SEQ>; #storing the sequence from file to a variable
chomp $row;
print "\nEnter the Motif sequence to be searched:";
my $motif = <>;
my $counter = 0;
chomp $motif;
while ($row =~ m|($motif)|g) {
$counter++;
print"\n";
print "The motif's occurnce $counter ends at position ", pos$row, "\n";
}
预期的输出是所有出现的 $motif 的列表,但程序没有终止,当我使用它手动终止它时,ctrl+c
它显示前 2-3 次出现,而不是像我在代码本身会在瞬间给出数百个匹配项。
如果我直接在我的代码中将文件序列(我正在搜索的)分配给变量$row
,那么while循环也可以正常运行,但是当我获取输入文件名($filename
)并将其写入时循环不能正常工作$row
以及用户要搜索的序列 ( $motif
)。分配其中任何一个内部代码,程序功能正常。
解决方案
You've change the input record separator ($/
) to '' (empty string) from the \n value it was before.
At this line: my $motif = <>;
input is expected, and it will not end as usual with "enter" (the \n). This is where your program "get stuck". It wait till it get a EOF value (end of file). You can use Ctrl+d (or Ctrl+z in windows) to pass EOF value so program will continue.
chomp
use it as well (input record separator), so it too will not work as expected (the 1st chomp will work fine as it is called before the change).
You should return its original value (even better to change it locally only). You also set the input record separator to an empty string. You should set it to undef
if you want to read file in "slurp mode".
You can read more here: slurp mode - reading a file in one step
Simple update to your code (make sure to delete the $/ = '';
line):
my $row = '';
{
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
local $/ = undef;
$row = <$fh>;
close $fh;
}
Though I would not recommend doing it this way... probably better reading the file to an array of lines, and using some more modern way like Path::Tiny.
I've made some small changes to your code and tested it successfully with MT_mouse.txt.
Code:
#!/usr/bin/perl
use strict;
use warnings;
print 'Enter the file name containing the sequence: ';
my $filename = <>;
chomp $filename;
open(my $fh, '<', $filename) or die "Could not open file '$filename' $!";
my @file_lines = <$fh>;
close $fh;
print 'Enter the Motif sequence to be searched: ';
my $motif = <>;
chomp $motif;
print 'Read ' . scalar(@file_lines) . " lines at file: '$filename'\nmotif: '$motif'\n";
my ($line, $occurrences) = (0, 0);
foreach my $row (@file_lines) {
$line++;
next unless $row =~ /\Q$motif\E/;
my @motif_index = ();
my $position = 0;
while((my $index = index $row, $motif, $position) >= 0) {
push(@motif_index, $index);
$position = $index + length $motif;
}
print "Motif found on row#$line\tat position(s): " . join(', ', @motif_index) . ".\n";
$occurrences += scalar @motif_index;
}
print "\nMotif '$motif' was " . ($occurrences ? "found $occurrences times" : 'not found') . " at file: '$filename'.\n";
__END__
Output:
Enter the file name containing the sequence: MT_mouse.txt
Enter the Motif sequence to be searched: ACCCC
Read 272 lines at file: 'MT_mouse.txt'
motif: 'ACCCC'
Motif found on row#4 at position(s): 41.
Motif found on row#9 at position(s): 19.
Motif found on row#11 at position(s): 40.
Motif found on row#23 at position(s): 8.
Motif found on row#33 at position(s): 3.
Motif found on row#59 at position(s): 1.
Motif found on row#61 at position(s): 31.
Motif found on row#65 at position(s): 37.
Motif found on row#83 at position(s): 3.
Motif found on row#98 at position(s): 22.
Motif found on row#115 at position(s): 48.
Motif found on row#122 at position(s): 26.
Motif found on row#132 at position(s): 49.
Motif found on row#133 at position(s): 36.
Motif found on row#173 at position(s): 21.
Motif found on row#183 at position(s): 21.
Motif found on row#188 at position(s): 52.
Motif found on row#199 at position(s): 7.
Motif found on row#209 at position(s): 51.
Motif found on row#228 at position(s): 28.
Motif found on row#230 at position(s): 43.
Motif found on row#247 at position(s): 45.
Motif found on row#249 at position(s): 53.
Motif found on row#269 at position(s): 11, 18, 39.
Motif 'ACCCC' was found 26 times at file: 'MT_mouse.txt'.
推荐阅读
- javascript - 命名空间(数字。*)列表是否有 Javascript 全局函数?
- angular - 在选择路由器链接时,所有设备的侧导航正在关闭
- jvm - Javamelody和同一节点中的多个应用程序和jvm
- c - C语言中需要变量声明吗?
- excel - 在excel中通过数据验证插入公式
- node.js - 使用 Node JS 更改 Markdown 文本中的图像源
- microsoft-graph-api - 没有 Files.Read 权限的图形 api 共享驱动器项目
- json - 我们可以在 JMeter 的 JSON Extracter 中的 JSONPath 表达式中使用变量吗
- elasticsearch - 按属性对相似数据进行排序
- xamarin.forms - 在 Xamarin.Forms 中创建 UI 元素