首页 > 解决方案 > Sed 正则表达式识别单个字母而不是单词

问题描述

我创建了一个记录随机森林分类器和逻辑回归训练的文件。它有以下文字:

Creating logistic regression model...
Done.
Creating random forest classifier model...
building tree 1 of 27
building tree 2 of 27
building tree 3 of 27
building tree 4 of 27
building tree 5 of 27
building tree 6 of 27
building tree 7 of 27
building tree 8 of 27
building tree 9 of 27
building tree 10 of 27
building tree 11 of 27
building tree 12 of 27
building tree 13 of 27
building tree 14 of 27
building tree 15 of 27
building tree 16 of 27
building tree 17 of 27
building tree 18 of 27
building tree 19 of 27
building tree 20 of 27
building tree 21 of 27
building tree 22 of 27
building tree 23 of 27
building tree 24 of 27
building tree 25 of 27
building tree 26 of 27
building tree 27 of 27
Train scores:
    Logistic Regression Recall: 0.6892336879192357
    Random Forest Recall: 0.5848905752422251
Test scores:
    Logistic Regression Recall: 0.6746186562629912
    Random Forest Recall: 0.5647724728982124

我想只提取分数线。我已经尝试过sed -n '/[Train|Test|Recall]/p' scoresscores作为文件名),但由于某种原因,即使-n应该禁止所有打印但模式匹配的行,它仍然会打印文件的全文。

当我运行cat scores | grep "[Train|Test|Recall]" -时,模式匹配荧光笔会检查每行中似乎匹配的字母[Train|Test|Recall],而不是实际的单词:例如,Creating logistic regression model...已经_creatin_ l__istic re_ressi_n ___el...突出显示。即使我添加边界,问题仍然存在:cat scores | grep "[\bTrain\b|\bTest\b|\bRecall\b]" -.

我对 grep 的理解是它应该匹配每个单词的全文;每个单词之间的管道应将每个单词标识为要检查的自己的模式。我需要如何编写这个正则表达式,以及如何在 sed 中指定我需要的任何参数?

标签: regexsedgrep

解决方案


方括号[]包含可能匹配的字符列表,因此您经常会看到诸如gr[ae]y同时匹配gray和的示例grey

对于您的使用,您可以省略括号Train|Test|Recall,或使用圆括号(Train|Test|Recall)

grep常规模式下,您的命令变为

cat scores | grep "\(Train\|Test\|Recall\)"

或者在扩展正则表达式模式下,它变成

cat scores | grep -E "(Train|Test|Recall)"

或在sed

cat scores | sed -E -n "/(Train|Test|Recall)/p"

推荐阅读