regex - Using re.finditer to generate iterative object, but no return, the regex code is ok when testing separately
问题描述
Here is the regex code
pattern="""
(?P<host>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})
(\ \-\ )
(?P<user_name>[a-z]{1,100}\d{4}|\-{1})
( \[)(?P<time>\d{2}\/[A-Za-z]{3}\/\d{4}\:\d{2}\:\d{2}\:\d{2}\ -\d{4})
(\] ")
(?P<request>.+)
(")
"""
for item in re.finditer(pattern,text,re.VERBOSE):
# We can get the dictionary returned for the item with .groupdict()
print(item.groupdict())
And I use Jupyter Notebook to run those codes.
The testing text is
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554
解决方案
The main issue is that you did not escape the literal space in your pattern. When using re.X
/ re.VERBOSE
any whitespace (when outside of a character class) in the pattern is treated as formatted whitespace and not accounted for in the end. In Python re
pattern, [ ]
will always match a literal space, but this is not guaranteed in other language flavors, so the best way to match a space in the pattern that is compiled with the re.X
like flag is escaping the space.
Besides, there are other things to note:
{1}
is always redundant, remove it- Repeated patterns can be grouped in a non-capturing group and quantified with an appropriate quantifier, e.g.
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
=>\d{1,3}(?:\.\d{1,3}){3}
- There is no need to escape
/
and:
(anywhere in the pattern) and-
(when outside a character class) in there
regex.
Thus, you can use
pattern = r'''(?P<host>\d{1,3}(?:\.\d{1,3}){3})
(\ -\ )
(?P<user_name>[a-z]{1,100}\d{4}|-)
(\ \[)(?P<time>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\ -\d{4})
(\]\ ")
(?P<request>.+)
(")'''
See the regex demo and the Python demo:
import re
text = '''146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554'''
pattern = r'''(?P<host>\d{1,3}(?:\.\d{1,3}){3})
(\ -\ )
(?P<user_name>[a-z]{1,100}\d{4}|-)
(\ \[)(?P<time>\d{2}/[A-Za-z]{3}/\d{4}:\d{2}:\d{2}:\d{2}\ -\d{4})
(\]\ ")
(?P<request>.+)
(")'''
for item in re.finditer(pattern,text,re.VERBOSE):
print(item.groupdict()) # We can get the dictionary returned for the item with .groupdict()
Output:
{'host': '146.204.224.152', 'user_name': 'feest6811', 'time': '21/Jun/2019:15:45:24 -0700', 'request': 'POST /incentivize HTTP/1.1'}
{'host': '197.109.77.178', 'user_name': 'kertzmann3129', 'time': '21/Jun/2019:15:45:25 -0700', 'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'}
推荐阅读
- java - 为什么每个实体都应该有单独的 MyBatis 映射器?
- android - 适配器的 onBindView 给出错误:java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
- swift - 无法将图像保存在正确的目录中
- angular - 如何在角度和打字稿中等待 API 调用完成
- django - wkhtmltopdf (pdfkit) 无法连接到任何 X 显示器
- azure - 如何在 YARN BUILD 之后在 VSTS 中获取 outbuild 构建目录
- r - R writing time intervals using lubridate package in an Excel file using XLConnect
- kubernetes - Kubernetes 网络,无法从另一个网络访问服务节点端口
- python - 搭建开发环境:PyCharm、python-gtk、windows
- android - 使用 rxjava 正确处理所有类型的改造错误