regex - Extract Decimal and Integer numbers from sentences using Perl
问题描述
I have sentences including letters, integers, and decimals.
Example:
There are 1.6mm, 2.1mmcycst.
There are many about 3mm cysts.
There are 2 cysts about 4~5mm.
("2.1mm cyst" or "2.1 mm scysts" is the accurate sentence, but our data is "2.1mmcycst")
From these sentences, I want to extract numeri's. For example,
1.6 and 2.1
3
4~5
I'm not familiar with regular expressions, and I cannot pick up only numeri's including decimals or other relative signs (eg., "~").
Here is the code:
#!/usr/bin/perl
my $qwe = "There are 1.6mm, 2.1mmcycst.";
print "$qwe\n";
if($qwe =~ /\d+(\.\d)?\d*/){
print "$&\n";
}
From the script, I got below output:
1.6
I am expecting 1.6 and 2.1.
How can I change my regex here to match multiple patterns in single line?
I use macOS 10.14.5 and perl v5.18.4.
解决方案
Do not reinvent the wheel. If a task seems common to you, it is likely that there is a Perl module for that. Regexp::Common
can be used for matching common regular expressions, including numbers of various kinds. For example, your sample input can be extended with more complex examples of numbers, all of which can be parsed as shown below:
Create the input:
cat > in.txt <<EOF
There are 1.6mm, 2.1mmcycst.
There are many about 3mm cysts.
There are 2 cysts about 4~5mm.
The collection has 1.23E6 frozen cysts, stored at -70.5C, with cysts ranging in size from 1e-3m to 5.12E-3
EOF
Parse and print the real numbers:
perl -MRegexp::Common -lne 'print join " ", /($RE{num}{real})/g;' in.txt
Output:
1.6 2.1
3
2 4 5
1.23E6 -70.5 1e-3 5.12E-3
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("\n"
on *NIX by default) before executing the code in-line, and append it when printing.
-MRegexp::Common
: same as BEGIN { use Regexp::Common; }
.
/($RE{num}{real})/g
: Capture all real numbers in the input line $_
. Parenthesis mean capture. /.../g
means match multiple times. In the LIST context, imposed by join
, this returns the list of all matches. These matches are then printed.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
Note: you need to install Regexp::Common
Perl module - it is not part of the standard Perl library.
推荐阅读
- r - 在 R 中构建直方图
- javascript - 为什么更改后对连接数组的操作会影响两个数组?
- vbscript - VBScript/ASP Classic - 解析和替换字符串中的多个项目
- typescript - 如何从枚举中获取联合类型的字符串值?
- wordpress - 如何将 wordpress 帖子中的特定图像设置为 Solilquy 滑块
- python - tape.gradient 使用 RNN GRU 模型需要大量时间(或卡住)
- python - 排序python(打印大写/小写字母)
- reactjs - Formik 将自定义的 Props 传递给自定义的 Field 组件
- python - 为什么我没有得到“y_pred”值?
- python - 按行主要顺序填充 2D 矩阵