regex - 在linux中打印任何单词以相同字母开头和结尾的行
问题描述
我有输入
sie%Qu7s Kuux"oh9 ohc9ahG% hoe8Toh: Eix*ohd1 doh:bo2U Cu0doo|t zo`L9xaW
fie5Du[h Phe8aid# Opu&fai5 ieZ<aek6 hu4ga&Di Oose}p1p aiD@oos2 nu-a1Fub
ahqu5To/ ahtie[H3 ioK&u5Ai nei1Za#d poo_Th9r gu|aGh7h uZ%io2ah IeNah&v7
eif\e8AE Ieb,ing4 reph1oW* eeSh'ee8 Ah+ei4ai Oi0Ca,vu Esh1xe?e Wei&k4ic
ue5OhQu. aaf-i8uP eedae%T5 sei?M9Pu ieH[oh2l ieh~ah8A aev"oo9A Ohf"i8de
Foh:x2zi aLoo'qu2 Ia6aig-e La{vie1E IeFoh{cI Au_h7Hee Se)f4ebi Cah$yu7m
其中列中的每个单词都构成密码 ، 我正在尝试打印任何单词以相同字母开头和结尾的行,因此我们不区分大小写字母
我知道用命令 grep 我可以做到这一点
cat passwords.txt | grep -e ' \([A-Z]\)......\1 ' -e ' \([a-z]\)......\1 '
但在这里,这个词只能以相同的后者(大写或小写字母)开始和结束,比如
Foh:x2zi aLoo'qu2 Ia6aig-e La{vie1E IeFoh{cI Au_h7Hee Se)f4ebi Cah$yu7m
预期产出
eif\e8AE Ieb,ing4 reph1oW* eeSh'ee8 Ah+ei4ai Oi0Ca,vu Esh1xe?e Wei&k4ic
sie%Qu7s Kuux"oh9 ohc9ahG% hoe8Toh: Eix*ohd1 doh:bo2U Cu0doo|t zo`L9xaW
ue5OhQu. aaf-i8uP eedae%T5 sei?M9Pu ieH[oh2l ieh~ah8A aev"oo9A Ohf"i8de
Foh:x2zi aLoo'qu2 Ia6aig-e La{vie1E IeFoh{cI Au_h7Hee Se)f4ebi Cah$yu7m
ahqu5To/ ahtie[H3 ioK&u5Ai nei1Za#d poo_Th9r gu|aGh7h uZ%io2ah IeNah&v7
解决方案
使用 GNU grep:
grep -i -P '(?<!\S)(\S)(?:\S*\1)?(?!\S)' passwords.txt
该-i
选项打开不区分大小写,-P
打开 PCRE 风格(支持后向/前瞻)。
请参阅正则表达式证明。
解释
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\S* non-whitespace (all but \n, \r, \t, \f,
and " ") (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\S non-whitespace (all but \n, \r, \t, \f,
and " ")
--------------------------------------------------------------------------------
) end of look-ahead
推荐阅读
- rabbitmq - 如何提高 RabbitMQ 低发布率性能
- business-objects - 在 Business Objects 中将计时器格式 hh:mm 转换为十进制格式 hh.decimal(mm)
- python - 创建进程失败
- reactjs - React Js Firebase Cloud消息在屏幕处于活动状态时未收到通知
- c++ - 在单元测试中,如何比较两个对象而不使用可能会错过新成员的 operator==?
- sql-server - 使用 Flume 将数据从 MS SQL Server 导入 HBase
- git - 如何使用 gitflow-maven-plugin 指定新的发布版本
- java - 数组列表
与另一个 LatLng 比较进行距离计算并将结果存储在 ArrayList 中 - python - 记录类的最佳实践
- javascript - 按多个值对对象进行分组并组合重复项