python - Regex - Word boundary not working even with raw-string
问题描述
I'm coding a set of regex to match dates in text using python. One of my regex was designed to match dates in the format MM/YYYY only. The regex is the following:
r'\b((?:(?:0)[0-9])|(?:(?:1)[0-2])|(?:(?:[1-9])))(?:\/|\\)(\d{4})\b'
Looks like the word boundary is not working as it is matching parts of dates like 12/02/2020 (it should not match this date format at all).
In the attached image only the second pattern should have been recognized. The first one shouldn't, even parts of it, have been a match.
Remembering that the regex should match the MM/YYYY pattern in strings like:
"The range of dates go from 21/02/2020 to 21/03/2020 as specified above."
Can you help me find the error in my pattern to make it match only my goal format?
解决方案
问题在于字符串中的\b\d{2}/\d{4}\b
匹配项,因为第一个正斜杠是分词符。解决方案是识别不应该在匹配之前和之后的字符,并使用否定的环视来代替分词。在这里你可以使用正则表达式02/2000
01/02/2000
r'(?<![\d/])(?:0[1-9]|1[0-2])/\d{4}(?![\d/])'
否定的lookbehind , (?<![\d/])
, 防止代表月份的两位数字前面有一个数字或正斜杠;负前瞻,防止代表年份的(?![\d/])
四位数字后跟一个数字或正斜杠。
如果6/2000
还要匹配06/2000
,则(?:0[1-9]
改为(?:0?[1-9]
。
推荐阅读
- lilypond - 重复同一个音符 n 次
- c# - Xamarin iOS 在 Xamarin 表单中获取连接的蓝牙设备名称
- android - 安装后android studio缺少预览布局
- react-native - 本地运行 react native 应用失败
- 3dsmax - 我需要帮助将对象放置在表面法线上
- testng - TestNG onConfigurationFailure:我如何注册所有测试并设置为失败?
- python-3.x - 在 Python 中查找列表的中间点
- php - 如何从 .htaccess 启用 CORS(在 ZURB 基金会项目内)
- laravel - BadMethodCallException 方法 App\Http\Controllers\StoreController::show 不存在
- spring-boot - Kafka中针对不同消费者的不同重试策略