python - 尝试使用 re.verbose 制作字典
问题描述
我正在尝试编写一个模式来制作像这样的字典:
string= '30.95.91.251 - larson8319 [21/Jun/2019:16:02:02 -0700] "PUT /one-to-one/whiteboard HTTP/1.0" 401 7270'
看起来像:
dic= {"host":"30.95.91.251",
"user_name":"larson8319",
"time":"21/Jun/2019:16:02:02 -0700",
"request":"PUT /one-to-one/whiteboard HTTP/1.0"}
使用此代码:
pattern = '''
(?P<host>.*)
(-\ )
(?P<user_name>\w*)
(?P<time>\W.+)
(?P<request>\w+)
'''
for item in re.finditer(pattern, logdata, re.VERBOSE):
print(item.groupdict())'
但我无法让括号消失并订购请求部分。
解决方案
更具体并使用字符类 ( [...]
):
(?P<host>[\d.]+)[-\s]+
(?P<user_name>\w+)\s+
\[(?P<time>[^][]+)\]\s+
"(?P<request>[^"]+)"
在 regex101.com 上查看演示。
或者 - 完全使用解析器:
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
string = '30.95.91.251 - larson8319 [21/Jun/2019:16:02:02 -0700] "PUT /one-to-one/whiteboard HTTP/1.0" 401 7270'
class LogVisitor(NodeVisitor):
grammar = Grammar(
r"""
entry = ip user time request rest
ip = ~"[\d.]+" junk
user = ~"\w+" junk
time = "[" ~"[^][]+" "]" junk
request = '"' ~"[^\"]+" '"'
junk = ~"[-\s]*"
rest = ~".*"
"""
)
def generic_visit(self, node, visited_children):
return visited_children or node
def visit_entry(self, node, visited_children):
ip, user, time, request, *_ = visited_children
return dict([ip, user, time, request])
def __clean__(self, node, visited_children, first=False):
if first:
_, what, *_ = visited_children
else:
what, *_ = visited_children
return what.text
def visit_ip(self, node, visited_children):
return ('ip', self.__clean__(node, visited_children))
def visit_user(self, node, visited_children):
return ('user', self.__clean__(node, visited_children))
def visit_time(self, node, visited_children):
return ('time', self.__clean__(node, visited_children, True))
def visit_request(self, node, visited_children):
return ('request', self.__clean__(node, visited_children, True))
lv = LogVisitor()
result = lv.parse(string)
print(result)
这会产生
{'ip': '30.95.91.251', 'user': 'larson8319', 'time': '21/Jun/2019:16:02:02 -0700', 'request': 'PUT /one-to-one/whiteboard HTTP/1.0'}
推荐阅读
- angular - FlexboxLayout addChild 不是函数
- php - 如何防止我的网站使用 php 在 iframe 中加载
- python - Pandas sort_values中的KeyError
- python - tkinter ImageTk 在日志文件中创建不需要的日志
- spacy - Spacy NER 的训练数据清理
- sql - 将动态行转置为列 BigQuery
- javascript - 如何在字段上创建具有排序的唯一 json 对象数组?
- python - 从 Python 2.7 和 Python 3 中的“str”继承
- php - api.php 的 Htacess Exeption
- npm - 在“npm adduser”期间输入电子邮件有什么意义?